Search Results

Now showing 1 - 5 of 5
  • Item
    Simultaneous statistical inference for epigenetic data
    (Berlin : Weierstraß-Institut für Angewandte Analysis und Stochastik, 2015) Schildknecht, Konstantin; Olek, Sven; Dickhaus, Thorsten
    Epigenetic research leads to complex data structures. Since parametric model assumptions for the distribution of epigenetic data are hard to verify we introduce in the present work a nonparametric statistical framework for two-group comparisons. Furthermore, epigenetic analyses are often performed at various genetic loci simultaneously. Hence, in order to be able to draw valid conclusions for specific loci, an appropriate multiple testing correction is necessary. Finally, with technologies available for the simultaneous assessment of many interrelated biological parameters (such as gene arrays), statistical approaches also need to deal with a possibly unknown dependency structure in the data. Our statistical approach to the nonparametric comparison of two samples with independent multivariate observables is based on recently developed multivariate multiple permutation tests. We adapt their theory in order to cope with families of hypotheses regarding relative effects. Our results indicate that the multivariate multiple permutation test keeps the pre-assigned type I error level for the global null hypothesis. In combination with the closure principle, the family-wise error rate for the simultaneous test of the corresponding locus/parameter-specific null hypotheses can be controlled. In applications we demonstrate that group differences in epigenetic data can be detected reliably with our methodology.
  • Item
    Uncertainty quantification for the family-wise error rate in multivariate copula models
    (Berlin : Weierstraß-Institut für Angewandte Analysis und Stochastik, 2013) Stange, Jens; Bodnar, Taras; Dickhaus, Thorsten
    We derive confidence regions for the realized family-wise error rate (FWER) of certain multiple tests which are empirically calibrated at a given (global) level of significance. To this end, we regard the FWER as a derived parameter of a multivariate parametric copula model. It turns out that the resulting onfidence regions are typically very much concentrated around the target FWER level, while generic multiple tests with fixed thresholds are in general not FWER-exhausting. Since FWER level exhaustion and optimization of power are equivalent for the classes of multiple test problems studied in this paper, the aforementioned findings militate strongly in favour of estimating the dependency structure (i. e., copula) and incorporating it in a multivariate multiple test procedure. We illustrate our theoretical results by considering two particular classes of multiple test problems of practical relevance in detail, namely, multiple tests for components of a mean vector and multiple support tests.
  • Item
    Utilizing anatomical information for signal detection in functional magnetic resonance imaging
    (Berlin : Weierstraß-Institut für Angewandte Analysis und Stochastik, 2021) Neumann, André; Peitek, Norman; Brechmann, André; Tabelow, Karsten; Dickhaus, Thorsten
    We are considering the statistical analysis of functional magnetic resonance imaging (fMRI) data. As demonstrated in previous work, grouping voxels into regions (of interest) and carrying out a multiple test for signal detection on the basis of these regions typically leads to a higher sensitivity when compared with voxel-wise multiple testing approaches. In the case of a multi-subject study, we propose to define the regions for each subject separately based on their individual brain anatomy, represented, e.g., by so-called Aparc labels. The aggregation of the subject-specific evidence for the presence of signals in the different regions is then performed by means of a combination function for p-values. We apply the proposed methodology to real fMRI data and demonstrate that our approach can perform comparably to a two-stage approach for which two independent experiments are needed, one for defining the regions and one for actual signal detection.
  • Item
    On the Simes inequality in elliptical models
    (Berlin : Weierstraß-Institut für Angewandte Analysis und Stochastik, 2014) Bodnar, Taras; Dickhaus, Thorsten
    We provide necessary and sufficient conditions for the validity of the inequality of Simes (1986) in models with elliptical dependencies. Necessary conditions are presented in terms of sufficient conditions for the reverse Simes inequality. One application of our main results concerns the problem of model misspecification, in particular the case that the assumption of Gaussianity of test statistics is violated. Since our sufficient conditions require nonnegativity of correlation coefficients between test statistics, we also develop exact tests for vectors of correlation coefficients.
  • Item
    Semi-supervised novelty detection
    (Berlin : Weierstraß-Institut für Angewandte Analysis und Stochastik, 2009) Blanchard, Gilles; Lee, Gyemin; Scott, Clayton
    A common setting for novelty detection assumes that labeled examples from the nominal class are available, but that labeled examples of novelties are unavailable. The standard (inductive) approach is to declare novelties where the nominal density is low, which reduces the problem to density level set estimation. In this paper, we consider the setting where an unlabeled and possibly contaminated sample is also available at learning time. We argue that novelty detection in this semi-supervised setting is naturally solved by a general reduction to a binary classification problem. In particular, a detector with a desired false positive rate can be achieved through a reduction to Neyman-Pearson classification. Unlike the inductive approach, semi-supervised novelty detection (SSND) yields detectors that are optimal (e.g., statistically consistent) regardless of the distribution on novelties. Therefore, in novelty detection, unlabeled data have a substantial impact on the theoretical properties of the decision rule. We validate the practical utility of SSND with an extensive experimental study. We also show that SSND provides distribution-free, learning-theoretic solutions to two well known problems in hypothesis testing. First, our results provide a general solution to the general two-sample problem, that is, the problem of determining whether two random samples arise from the same distribution. Second, a specialization of SSND coincides with the standard $p$-value approach to multiple testing under the so-called random effects model. Unlike standard rejection regions based on thresholded $p$-values, the general SSND framework allows for adaptation to arbitrary alternative distributions.