Using mixture models to detect differentially expressed genes

G. J. McLachlan; R. W. Bean; L. Ben-Tovim Jones; J. X. Zhu

doi:10.1071/EA05051

RESEARCH ARTICLE

Previous Next Contents Vol 45(8)

Using mixture models to detect differentially expressed genes

G. J. McLachlan ^A ^B ^C ^D , R. W. Bean ^B , L. Ben-Tovim Jones ^B and J. X. Zhu ^B

+ Author Affiliations

- Author Affiliations

^A Department of Mathematics, University of Queensland, Qld 4072, Australia.

^B ARC Centre in Bioinformatics, Institute for Molecular Bioscience, University of Queensland, Qld 4072, Australia.

^C ARC Special Research Centre for Functional and Applied Genomics, University of Queensland, Qld 4072, Australia.

^D Corresponding author. Email: gjm@maths.uq.edu.au

Australian Journal of Experimental Agriculture 45(8) 859-866 https://doi.org/10.1071/EA05051
Submitted: 14 February 2005 Accepted: 6 May 2005 Published: 26 August 2005

Abstract

An important and common problem in microarray experiments is the detection of genes that are differentially expressed in a given number of classes. As this problem concerns the selection of significant genes from a large pool of candidate genes, it needs to be carried out within the framework of multiple hypothesis testing. In this paper, we focus on the use of mixture models to handle the multiplicity issue. With this approach, a measure of the local false discovery rate is provided for each gene, and it can be implemented so that the implied global false discovery rate is bounded as with the Benjamini-Hochberg methodology based on tail areas. The latter procedure is too conservative, unless it is modified according to the prior probability that a gene is not differentially expressed. An attractive feature of the mixture model approach is that it provides a framework for the estimation of this probability and its subsequent use in forming a decision rule. The rule can also be formed to take the false negative rate into account.

Additional keywords: multiple hypothesis testing, false discovery rate, Bayes formula, Bayes rule.

References

Allison DB, Gadbury GL, Heo M, Fernandez JR, Lee C-K, Prolla TA, Weindruch R (2002) A mixture model approach for the analysis of microarray gene expression data. Computational Statistics and Data Analysis 39, 1–20.
| Crossref | GoogleScholarGoogle Scholar |

Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series A (General) 57, 289–300.

Benjamini Y, Yekutieli D (2001) The control of the false discovery rate under dependency. Annals of Statistics 29, 1165–1188.
| Crossref | GoogleScholarGoogle Scholar |

Black MA (2004) A note on the adaptive control of false discovery rates. Journal of the Royal Statistical Society. Series A (General) 66, 297–304.
| Crossref | GoogleScholarGoogle Scholar |

Broët P, Lewin A, Richardson S, Dalmasso C, Magdelenat H (2004) A mixture model-based strategy for selecting sets of genes in multiclass response microarray experiments. Bioinformatics (Oxford, England) 20, 2562–2571.
| Crossref | GoogleScholarGoogle Scholar | PubMed |

Cox DR, Wong MY (2004) A simple procedure for the selection of significant effects. Journal of the Royal Statistical Society 66, 395–400.
| Crossref | GoogleScholarGoogle Scholar |

Cui X, Churchill GA (2003) Statistical tests for differential expression in cDNA microarray experiments. Genome Biology 4, 210–219.
| Crossref | GoogleScholarGoogle Scholar | PubMed |

DeRisi JL, Iyer VR, Brown PO (1997) Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 278, 680–686.
| Crossref | GoogleScholarGoogle Scholar | PubMed |

Diaconis P, Ylvisaker D (1985) Quantifying prior opinion. In ‘Bayesian statistics 2’. (Eds JM Bernardo, MH DeGroot, DV Lindley, AFM Smith) pp. 133–156. (Wiley: New York)

Do K-A, Mueller P, Tang F (2003) A Bayesian mixture model for differential gene expression. Technical Report, Department of Biostatistics, University of Texas/MD Anderson Cancer Center, Houston, TX.

Dudoit S, Fridlyand J (2002) A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biology, 0036.1–0036.21.

Dudoit S, Popper Shaffer J, Boldrick JC (2003) Multiple hypothesis testing in microarray experiments. Statistical Science 18, 71–103.
| Crossref | GoogleScholarGoogle Scholar |

Efron B (2004) Large-scale simultaneous hypothesis testing: the choice of a null hypothesis. Journal of the American Statistical Association 99, 96–104.

Efron B, Tibshirani R (2002) Empirical Bayes methods and false discovery rates for microarrays. Genetic Epidemiology 23, 70–86.
| Crossref | GoogleScholarGoogle Scholar | PubMed |

Efron B, Tibshirani R, Storey JD, Tusher V (2001) Empirical Bayes analysis of a microarray experiment. Journal of the American Statistical Association 96, 1151–1160.
| Crossref | GoogleScholarGoogle Scholar |

Genovese CR, Wasserman L (2002) Operating characteristics and extensions of the false discovery rate procedure. Journal of the Royal Statistical Society. Series A (General) 64, 499–517.
| Crossref | GoogleScholarGoogle Scholar |

Hedenfalk I, Ringnër M, Ben-Dor A, Yakhini Z, Chen Y , et al. (2003) Molecular classification of familial non-BRCA1/BRCA2 breast cancer. Proceedings of the National Academy of Sciences of the United States of America 100, 2532–2537.
| Crossref | GoogleScholarGoogle Scholar | PubMed |

Johnson NL, Kotz S (1970) ‘Continuous univariate distributions. Vol. 2.’ (Wiley: New York)

Kendziorski CM, Newton MA, Lan H, Gould MN (2003) On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles. Statistical Methodology 22, 3899–3914.

Lee MT, Kuo FC, Whitmore GA, Sklar J (2000) Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cDNA hybridizations. Proceedings of the National Academy of Sciences of the United States of America 97, 9834–9838.
| Crossref | GoogleScholarGoogle Scholar | PubMed |

Lehmann EL (1959) ‘Testing statistical hypotheses.’ (Wiley: New York)

McLachlan GJ, Do KA, Ambroise C (2004) ‘Analyzing microarray gene expression data.’ (Wiley: New York)

Newton MA, Kendziorski C (2003) Parametric empirical Bayes methods for microarrays. In ‘The analysis of gene expression data: methods and software’. (Eds G Parmigiani, ES Garrett, RA Irizarry, SL Zeger) pp. 254–271. (Springer: New York)

Newton MA, Kendziorski CM, Richmond CS, Blattner FR, Tsui KW (2001) On differential variability of expression ratios: improving statistical inference about gene expression changes from microarray data. Journal of Computational Biology 8, 37–52.
| Crossref | GoogleScholarGoogle Scholar | PubMed |

Newton MA, Noueiry A, Sarkar D, Ahlquist P (2004) Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics (Oxford, England) 5, 155–176.
| Crossref | GoogleScholarGoogle Scholar | PubMed |

Pan W (2002) A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments. Bioinformatics (Oxford, England) 18, 546–554.
| Crossref | GoogleScholarGoogle Scholar | PubMed |

Pan W (2003) On the use of permutation in and the performance of a class of nonparametric methods to detect differential gene expression. Bioinformatics (Oxford, England) 19, 1333–1340.
| Crossref | GoogleScholarGoogle Scholar | PubMed |

Reiner A, Yekutieli D, Benjamini Y (2003) Identifying differentially expressed genes using false discovery rate controlling procedures. Bioinformatics (Oxford, England) 19, 368–375.
| Crossref | GoogleScholarGoogle Scholar | PubMed |

Schena M, Shaon D, Heller R, Chai A, Brown P, Davis RW (1996) Parallel human genome analysis: microarray-based expression monitoring of 1000 genes. Proceedings of the National Academy of Sciences of the United States of America 93, 10614–10619.
| Crossref | GoogleScholarGoogle Scholar | PubMed |

Storey JD (2002) A direct approach to false discovery rates. Journal of the Royal Statistical Society. Series A (General) 64, 479–498.
| Crossref | GoogleScholarGoogle Scholar |

Storey JD (2003) The positive false discovery rate: a Bayesian interpretation and the q-value. Annals of Statistics 31, 2013–2035.
| Crossref | GoogleScholarGoogle Scholar |

Storey J, Taylor JE, Siegmund D (2004) Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach. Journal of the Royal Statistical Society. Series A (General) 66, 187–205.
| Crossref | GoogleScholarGoogle Scholar |

Storey JD, Tibshirani R (2003a) SAM thresholding and false discovery rates for detecting differential gene expression in DNA microarrays. In ‘The analysis of gene expression data: methods and software’. (Eds G Parmigiani, ES Garrett, RA Irizarry, SL Zeger) pp. 272–290. (Springer: New York)

Storey JD, Tibshirani R (2003b) Statistical significance for genome-wide studies. Proceedings of the National Academy of Sciences of the United States of America 100, 9440–9445.
| Crossref | GoogleScholarGoogle Scholar | PubMed |

Tusher VG, Tibshirani R, Chu G (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proceedings of the National Academy of Sciences of the United States of America 98, 5116–5121.
| Crossref | GoogleScholarGoogle Scholar | PubMed |

Wit E, McClure J (2004) ‘Statistics for microarrays: design, analysis and inference.’ (Wiley: Chichester)

Zhao Y, Pan W (2003) Modified nonparametric approaches to detecting differentially expressed genes in replicated microarray experiments. Bioinformatics (Oxford, England) 19, 1046–1054.
| Crossref | GoogleScholarGoogle Scholar | PubMed |

Using mixture models to detect differentially expressed genes

Abstract

Subscriber Login