Register      Login
Animal Production Science Animal Production Science Society
Food, fibre and pharmaceuticals from animals
RESEARCH ARTICLE

Using mixture models to detect differentially expressed genes

G. J. McLachlan A B C D , R. W. Bean B , L. Ben-Tovim Jones B and J. X. Zhu B
+ Author Affiliations
- Author Affiliations

A Department of Mathematics, University of Queensland, Qld 4072, Australia.

B ARC Centre in Bioinformatics, Institute for Molecular Bioscience, University of Queensland, Qld 4072, Australia.

C ARC Special Research Centre for Functional and Applied Genomics, University of Queensland, Qld 4072, Australia.

D Corresponding author. Email: gjm@maths.uq.edu.au

Australian Journal of Experimental Agriculture 45(8) 859-866 https://doi.org/10.1071/EA05051
Submitted: 14 February 2005  Accepted: 6 May 2005   Published: 26 August 2005

Abstract

An important and common problem in microarray experiments is the detection of genes that are differentially expressed in a given number of classes. As this problem concerns the selection of significant genes from a large pool of candidate genes, it needs to be carried out within the framework of multiple hypothesis testing. In this paper, we focus on the use of mixture models to handle the multiplicity issue. With this approach, a measure of the local false discovery rate is provided for each gene, and it can be implemented so that the implied global false discovery rate is bounded as with the Benjamini-Hochberg methodology based on tail areas. The latter procedure is too conservative, unless it is modified according to the prior probability that a gene is not differentially expressed. An attractive feature of the mixture model approach is that it provides a framework for the estimation of this probability and its subsequent use in forming a decision rule. The rule can also be formed to take the false negative rate into account.

Additional keywords: multiple hypothesis testing, false discovery rate, Bayes formula, Bayes rule.


References


Allison DB, Gadbury GL, Heo M, Fernandez JR, Lee C-K, Prolla TA, Weindruch R (2002) A mixture model approach for the analysis of microarray gene expression data. Computational Statistics and Data Analysis 39, 1–20.
Crossref | GoogleScholarGoogle Scholar | open url image1

Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series A (General) 57, 289–300. open url image1

Benjamini Y, Yekutieli D (2001) The control of the false discovery rate under dependency. Annals of Statistics 29, 1165–1188.
Crossref | GoogleScholarGoogle Scholar | open url image1

Black MA (2004) A note on the adaptive control of false discovery rates. Journal of the Royal Statistical Society. Series A (General) 66, 297–304.
Crossref | GoogleScholarGoogle Scholar | open url image1

Broët P, Lewin A, Richardson S, Dalmasso C, Magdelenat H (2004) A mixture model-based strategy for selecting sets of genes in multiclass response microarray experiments. Bioinformatics (Oxford, England) 20, 2562–2571.
Crossref | GoogleScholarGoogle Scholar | PubMed | open url image1

Cox DR, Wong MY (2004) A simple procedure for the selection of significant effects. Journal of the Royal Statistical Society 66, 395–400.
Crossref | GoogleScholarGoogle Scholar | open url image1

Cui X, Churchill GA (2003) Statistical tests for differential expression in cDNA microarray experiments. Genome Biology 4, 210–219.
Crossref | GoogleScholarGoogle Scholar | PubMed | open url image1

DeRisi JL, Iyer VR, Brown PO (1997) Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 278, 680–686.
Crossref | GoogleScholarGoogle Scholar | PubMed | open url image1

Diaconis P, Ylvisaker D (1985) Quantifying prior opinion. In ‘Bayesian statistics 2’. (Eds JM Bernardo, MH DeGroot, DV Lindley, AFM Smith) pp. 133–156. (Wiley: New York)

Do K-A, Mueller P, Tang F (2003) A Bayesian mixture model for differential gene expression. Technical Report, Department of Biostatistics, University of Texas/MD Anderson Cancer Center, Houston, TX.

Dudoit S, Fridlyand J (2002) A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biology, 0036.1–0036.21.

Dudoit S, Popper Shaffer J, Boldrick JC (2003) Multiple hypothesis testing in microarray experiments. Statistical Science 18, 71–103.
Crossref | GoogleScholarGoogle Scholar | open url image1

Efron B (2004) Large-scale simultaneous hypothesis testing: the choice of a null hypothesis. Journal of the American Statistical Association 99, 96–104. open url image1

Efron B, Tibshirani R (2002) Empirical Bayes methods and false discovery rates for microarrays. Genetic Epidemiology 23, 70–86.
Crossref | GoogleScholarGoogle Scholar | PubMed | open url image1

Efron B, Tibshirani R, Storey JD, Tusher V (2001) Empirical Bayes analysis of a microarray experiment. Journal of the American Statistical Association 96, 1151–1160.
Crossref | GoogleScholarGoogle Scholar | open url image1

Genovese CR, Wasserman L (2002) Operating characteristics and extensions of the false discovery rate procedure. Journal of the Royal Statistical Society. Series A (General) 64, 499–517.
Crossref | GoogleScholarGoogle Scholar | open url image1

Hedenfalk I, Ringnër M, Ben-Dor A, Yakhini Z, Chen Y , et al. (2003) Molecular classification of familial non-BRCA1/BRCA2 breast cancer. Proceedings of the National Academy of Sciences of the United States of America 100, 2532–2537.
Crossref | GoogleScholarGoogle Scholar | PubMed | open url image1

Johnson NL, Kotz S (1970) ‘Continuous univariate distributions. Vol. 2.’ (Wiley: New York)

Kendziorski CM, Newton MA, Lan H, Gould MN (2003) On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles. Statistical Methodology 22, 3899–3914. open url image1

Lee MT, Kuo FC, Whitmore GA, Sklar J (2000) Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cDNA hybridizations. Proceedings of the National Academy of Sciences of the United States of America 97, 9834–9838.
Crossref | GoogleScholarGoogle Scholar | PubMed | open url image1

Lehmann EL (1959) ‘Testing statistical hypotheses.’ (Wiley: New York)

McLachlan GJ, Do KA, Ambroise C (2004) ‘Analyzing microarray gene expression data.’ (Wiley: New York)

Newton MA, Kendziorski C (2003) Parametric empirical Bayes methods for microarrays. In ‘The analysis of gene expression data: methods and software’. (Eds G Parmigiani, ES Garrett, RA Irizarry, SL Zeger) pp. 254–271. (Springer: New York)

Newton MA, Kendziorski CM, Richmond CS, Blattner FR, Tsui KW (2001) On differential variability of expression ratios: improving statistical inference about gene expression changes from microarray data. Journal of Computational Biology 8, 37–52.
Crossref | GoogleScholarGoogle Scholar | PubMed | open url image1

Newton MA, Noueiry A, Sarkar D, Ahlquist P (2004) Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics (Oxford, England) 5, 155–176.
Crossref | GoogleScholarGoogle Scholar | PubMed | open url image1

Pan W (2002) A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments. Bioinformatics (Oxford, England) 18, 546–554.
Crossref | GoogleScholarGoogle Scholar | PubMed | open url image1

Pan W (2003) On the use of permutation in and the performance of a class of nonparametric methods to detect differential gene expression. Bioinformatics (Oxford, England) 19, 1333–1340.
Crossref | GoogleScholarGoogle Scholar | PubMed | open url image1

Reiner A, Yekutieli D, Benjamini Y (2003) Identifying differentially expressed genes using false discovery rate controlling procedures. Bioinformatics (Oxford, England) 19, 368–375.
Crossref | GoogleScholarGoogle Scholar | PubMed | open url image1

Schena M, Shaon D, Heller R, Chai A, Brown P, Davis RW (1996) Parallel human genome analysis: microarray-based expression monitoring of 1000 genes. Proceedings of the National Academy of Sciences of the United States of America 93, 10614–10619.
Crossref | GoogleScholarGoogle Scholar | PubMed | open url image1

Storey JD (2002) A direct approach to false discovery rates. Journal of the Royal Statistical Society. Series A (General) 64, 479–498.
Crossref | GoogleScholarGoogle Scholar | open url image1

Storey JD (2003) The positive false discovery rate: a Bayesian interpretation and the q-value. Annals of Statistics 31, 2013–2035.
Crossref | GoogleScholarGoogle Scholar | open url image1

Storey J, Taylor JE, Siegmund D (2004) Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach. Journal of the Royal Statistical Society. Series A (General) 66, 187–205.
Crossref | GoogleScholarGoogle Scholar | open url image1

Storey JD, Tibshirani R (2003a) SAM thresholding and false discovery rates for detecting differential gene expression in DNA microarrays. In ‘The analysis of gene expression data: methods and software’. (Eds G Parmigiani, ES Garrett, RA Irizarry, SL Zeger) pp. 272–290. (Springer: New York)

Storey JD, Tibshirani R (2003b) Statistical significance for genome-wide studies. Proceedings of the National Academy of Sciences of the United States of America 100, 9440–9445.
Crossref | GoogleScholarGoogle Scholar | PubMed | open url image1

Tusher VG, Tibshirani R, Chu G (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proceedings of the National Academy of Sciences of the United States of America 98, 5116–5121.
Crossref | GoogleScholarGoogle Scholar | PubMed | open url image1

Wit E, McClure J (2004) ‘Statistics for microarrays: design, analysis and inference.’ (Wiley: Chichester)

Zhao Y, Pan W (2003) Modified nonparametric approaches to detecting differentially expressed genes in replicated microarray experiments. Bioinformatics (Oxford, England) 19, 1046–1054.
Crossref | GoogleScholarGoogle Scholar | PubMed | open url image1