Search Results

Now showing 1 - 10 of 10
Loading...
Thumbnail Image
Item

Simultaneous statistical inference for epigenetic data

2015, Schildknecht, Konstantin, Olek, Sven, Dickhaus, Thorsten

Epigenetic research leads to complex data structures. Since parametric model assumptions for the distribution of epigenetic data are hard to verify we introduce in the present work a nonparametric statistical framework for two-group comparisons. Furthermore, epigenetic analyses are often performed at various genetic loci simultaneously. Hence, in order to be able to draw valid conclusions for specific loci, an appropriate multiple testing correction is necessary. Finally, with technologies available for the simultaneous assessment of many interrelated biological parameters (such as gene arrays), statistical approaches also need to deal with a possibly unknown dependency structure in the data. Our statistical approach to the nonparametric comparison of two samples with independent multivariate observables is based on recently developed multivariate multiple permutation tests. We adapt their theory in order to cope with families of hypotheses regarding relative effects. Our results indicate that the multivariate multiple permutation test keeps the pre-assigned type I error level for the global null hypothesis. In combination with the closure principle, the family-wise error rate for the simultaneous test of the corresponding locus/parameter-specific null hypotheses can be controlled. In applications we demonstrate that group differences in epigenetic data can be detected reliably with our methodology.

Loading...
Thumbnail Image
Item

On an extended interpretation of linkage disequilibrium in genetic case-control association studies

2014, Dickhaus, Thorsten, Stange, Jens, Demirhan, Haydar

We are concerned with statistical inference for 2 x 2 x K contingency tables in the context of genetic case-control association studies. Multivariate methods based on asymptotic Gaussianity of vectors of test statistics require information about the asymptotic correlation structure among these test statistics under the global null hypothesis. We show that for a wide variety of test statistics this asymptotic correlation structure is given by the linkage disequilibrium matrix of the K loci under investigation. Three popular choices of test statistics are discussed for illustration.

Loading...
Thumbnail Image
Item

Utilizing anatomical information for signal detection in functional magnetic resonance imaging

2021, Neumann, André, Peitek, Norman, Brechmann, André, Tabelow, Karsten, Dickhaus, Thorsten

We are considering the statistical analysis of functional magnetic resonance imaging (fMRI) data. As demonstrated in previous work, grouping voxels into regions (of interest) and carrying out a multiple test for signal detection on the basis of these regions typically leads to a higher sensitivity when compared with voxel-wise multiple testing approaches. In the case of a multi-subject study, we propose to define the regions for each subject separately based on their individual brain anatomy, represented, e.g., by so-called Aparc labels. The aggregation of the subject-specific evidence for the presence of signals in the different regions is then performed by means of a combination function for p-values. We apply the proposed methodology to real fMRI data and demonstrate that our approach can perform comparably to a two-stage approach for which two independent experiments are needed, one for defining the regions and one for actual signal detection.

Loading...
Thumbnail Image
Item

More specific signal detection in functional magnetic resonance imaging by false discovery rate control for hierarchically structured systems of hypotheses

2015, Schildknecht, Konstantin, Tabelow, Karsten, Dickhaus, Thorsten

Signal detection in functional magnetic resonance imaging (fMRI) inherently involves the problem of testing a large number of hypotheses. A popular strategy to address this multiplicity is the control of the false discovery rate (FDR). In this work we consider the case where prior knowledge is available to partition the set of all hypotheses into disjoint subsets or families, e. g., by a-priori knowledge on the functionality of certain regions of interest. If the proportion of true null hypotheses differs between families, this structural information can be used to increase statistical power. We propose a two-stage multiple test procedure which first excludes those families from the analysis for which there is no strong evidence for containing true alternatives. We show control of the family-wise error rate at this first stage of testing. Then, at the second stage, we proceed to test the hypotheses within each non-excluded family and obtain asymptotic control of the FDR within each family in this second stage. Our main mathematical result is that this two-stage strategy implies asymptotic control of the FDR with respect to all hypotheses. In simulations we demonstrate the increased power of this new procedure in comparison with established procedures in situations with highly unbalanced families. Finally, we apply the proposed method to simulated and to real fMRI data.

Loading...
Thumbnail Image
Item

On the Simes inequality in elliptical models

2014, Bodnar, Taras, Dickhaus, Thorsten

We provide necessary and sufficient conditions for the validity of the inequality of Simes (1986) in models with elliptical dependencies. Necessary conditions are presented in terms of sufficient conditions for the reverse Simes inequality. One application of our main results concerns the problem of model misspecification, in particular the case that the assumption of Gaussianity of test statistics is violated. Since our sufficient conditions require nonnegativity of correlation coefficients between test statistics, we also develop exact tests for vectors of correlation coefficients.

Loading...
Thumbnail Image
Item

Uncertainty quantification for the family-wise error rate in multivariate copula models

2013, Stange, Jens, Bodnar, Taras, Dickhaus, Thorsten

We derive confidence regions for the realized family-wise error rate (FWER) of certain multiple tests which are empirically calibrated at a given (global) level of significance. To this end, we regard the FWER as a derived parameter of a multivariate parametric copula model. It turns out that the resulting onfidence regions are typically very much concentrated around the target FWER level, while generic multiple tests with fixed thresholds are in general not FWER-exhausting. Since FWER level exhaustion and optimization of power are equivalent for the classes of multiple test problems studied in this paper, the aforementioned findings militate strongly in favour of estimating the dependency structure (i. e., copula) and incorporating it in a multivariate multiple test procedure. We illustrate our theoretical results by considering two particular classes of multiple test problems of practical relevance in detail, namely, multiple tests for components of a mean vector and multiple support tests.

Loading...
Thumbnail Image
Item

Simultaneous Bayesian analysis of contingency tables in genetic association studies

2014, Dickhaus, Thorsten

Genetic association studies lead to simultaneous categorical data analysis. The sample for every genetic locus consists of a contingency table containing the numbers of observed genotype-phenotype combinations. Under case-control design, the row counts of every table are identical and fixed, while column counts are random. The aim of the statistical analysis is to test independence of the phenotype and the genotype at every locus. We present an objective Bayesian methodology for these association tests, utilizing the Bayes factor proposed by Good (1976) and Crook and Good (1980). It relies on the conjugacy of Dirichlet and multinomial distributions, where the hyperprior for the Dirichlet parameter is log-Cauchy. Being based on the likelihood principle, the Bayesian tests avoid looping over all tables with given marginals. Hence, their computational burden does not increase with the sample size, in contrast to frequentist exact tests. Making use of data generated by The Wellcome Trust Case Control Consortium (2007), we illustrate that the ordering of the Bayes factors shows a good agreement with that of frequentist p-values. Furthermore, we deal with specifying prior probabilities for the validity of the null hypotheses, by taking linkage disequilibrium structure into account and exploiting the concept of effective numbers of tests. Application of a Bayesian decision theoretic multiple test procedure to The Wellcome Trust Case Control Consortium (2007) data illustrates the proposed methodology. Finally, we discuss two methods for reconciling frequentist and Bayesian approaches to the multiple association test problem for contingency tables in genetic association studies.

Loading...
Thumbnail Image
Item

Computing and approximating multivariate chi-square probabilities

2014, Stange, Jens, Loginova, Nina, Dickhaus, Thorsten

We consider computational methods for evaluating and approximating multivariate chi-square probabilities in cases where the pertaining correlation matrix or blocks thereof have a low-factorial representation. To this end, techniques from matrix factorization and probability theory are applied. We outline a variety of statistical applications of multivariate chi-square distributions and provide a system of MATLAB programs implementing the proposed algorithms. Computer simulations demonstrate the accuracy and the computational efficiency of our methods in comparison with Monte Carlo approximations, and a real data example from statistical genetics illustrates their usage in practice.

Loading...
Thumbnail Image
Item

Self-concordant profile empirical likelihood ratio tests for the population correlation coefficient: A simulation study

2014, Dickhaus, Thorsten

We present results of a simulation study regarding the finite-sample type I error behavior of the self-concordant profile empirical likelihood ratio (ELR) test for the population correlation coefficient. Three different families of bivariate elliptical distributions are taken into account. Uniformly over all considered models and parameter configurations, the self-concordant profile ELR test does not keep the significance level for finite sample sizes, albeit the level exceedance monotonously decreases to zero as the sample size increases. We discuss some potential modifications to address this problem.

Loading...
Thumbnail Image
Item

On multivariate chi-square distributions and their applications in testing multiple hypotheses

2014, Dickhaus, Thorsten, Royen, Thomas

We are considered with three different types of multivariate chi-square distributions. Their members play important roles as limiting distributions of vectors of test statistics in several applications of multiple hypotheses testing. We explain these applications and provide formulas for computing multiplicity-adjusted p-values under the respective global hypothesis.