On the Estimation of Confidence Intervals for Binomial Population Proportions in Astronomy: The Simplicity and Superiority of the Bayesian Approach
Ewan CameronA Department of Physics, Swiss Federal Institute of Technology (ETH Zurich), CH-8093 Zurich, Switzerland.
B Email: cameron@phys.ethz.ch
Publications of the Astronomical Society of Australia 28(2) 128-139 https://doi.org/10.1071/AS10046
Submitted: 03 December 10 Accepted: 01 March 2011 Published: 16 June 2011
Journal Compilation © Astronomical Society of Australia 2011
Abstract
I present a critical review of techniques for estimating confidence intervals on binomial population proportions inferred from success counts in small to intermediate samples. Population proportions arise frequently as quantities of interest in astronomical research; for instance, in studies aiming to constrain the bar fraction, active galactic nucleus fraction, supermassive black hole fraction, merger fraction, or red sequence fraction from counts of galaxies exhibiting distinct morphological features or stellar populations. However, two of the most widely-used techniques for estimating binomial confidence intervals — the ‘normal approximation’ and the Clopper & Pearson approach — are liable to misrepresent the degree of statistical uncertainty present under sampling conditions routinely encountered in astronomical surveys, leading to an ineffective use of the experimental data (and, worse, an inefficient use of the resources expended in obtaining that data). Hence, I provide here an overview of the fundamentals of binomial statistics with two principal aims: (i) to reveal the ease with which (Bayesian) binomial confidence intervals with more satisfactory behaviour may be estimated from the quantiles of the beta distribution using modern mathematical software packages (e.g. r, matlab, mathematica, idl, python); and (ii) to demonstrate convincingly the major flaws of both the ‘normal approximation’ and the Clopper & Pearson approach for error estimation.
Keywords: methods: data analysis — methods: statistical
References
Agresti, A. and Coull, B. A., 1998, The American Statistician, 52, 119| Crossref | GoogleScholarGoogle Scholar |
Baldry, I. K., Balogh, M. L., Bower, R. G., Glazebrook, K., Nicol, R. C., Bamford, S. P. and Budavari, T., 2006, MNRAS, 373, 469
| Crossref | GoogleScholarGoogle Scholar |
Burgasser, A. J., Kirkpatrick, J. D., Reid, N. I., Brown, M. E., Miskey, C. L. and Gizis, J. E., 2003, ApJ, 586, 512
| Crossref | GoogleScholarGoogle Scholar |
Brown, L. D., Cai, T. T. and DasGupta, A., 2001, Statistical Science, 16, 101
Brown, L. D., Cai, T. T. and DasGupta, A., 2002, The Annals of Statistics, 30, 160
| Crossref | GoogleScholarGoogle Scholar |
Cameron, E. et al., 2010, MNRAS, 409, 346
| Crossref | GoogleScholarGoogle Scholar |
Clopper, C. J. and Pearson, E. S., 1934, Biometrika, 26, 404
| Crossref | GoogleScholarGoogle Scholar |
Conselice, C. J., Rajgor, S. and Myers, R., 2008, , , 909
Cousins, R. D., Hymes, K. E. and Tucker, T., 2009, NIM, 612, 388
| Crossref | GoogleScholarGoogle Scholar |
De Propris, R., Liske, J., Driver, S. P., Allen, P. D. and Cross, N. J. G., 2005, ApJ, 130, 1516
Elmegreen, D. M., Elmegreen, B. G. and Bellin, A. D., 1990, ApJ, 364, 415
| Crossref | GoogleScholarGoogle Scholar |
Gehrels, N., 1986, ApJ, 303, 336
| Crossref | GoogleScholarGoogle Scholar | 1:CAS:528:DyaL28XhvFKgsbY%3D&md5=3e027fefcff229d2c5619efab98bd7bdCAS |
Gelman A. , Carlin J. B. , Stern H. S. & Rubin D. B. , 2003, Bayesian Data Analysis, (New York: Chapman & Hall)
Hester, J. A., 2010, ApJ, 720, 191
| Crossref | GoogleScholarGoogle Scholar | 1:CAS:528:DC%2BC3cXhtF2kt7zK&md5=9de365d714c93872907a1fbc3439a5a8CAS |
Ilbert, O. et al., 2010, ApJ, 709, 644
| Crossref | GoogleScholarGoogle Scholar | 1:CAS:528:DC%2BC3cXisFCht7w%3D&md5=c8e25b9fc83ec7662b88c6a2551af477CAS |
Kraft, R. P., Burrows, D. N. and Nousek, J. A., 1991, ApJ, 374, 344
| Crossref | GoogleScholarGoogle Scholar |
Quirin W. L. , 1978, Probability and Statistics (New York: Harper & Row Publishers)
López-Sanjuan, C., Balcells, M., Pérez-González, P. G., Barro, G., Gallego, J. and Zamorano, J., 2010, A&A, 518, 20
| Crossref | GoogleScholarGoogle Scholar |
Nair, P. B. and Abraham, R. G., 2010, ApJL, 714, L260
| Crossref | GoogleScholarGoogle Scholar |
Neyman, J., 1935, The Annals of Mathematical Statistics, 6, 111
| Crossref | GoogleScholarGoogle Scholar |
Rao, M. M. and Swift, R. J., 2006, Mathematics and Its Applications, , 582
Ross, T. D., 2003, Computers in Biology and Medicine, 33, 509
| Crossref | GoogleScholarGoogle Scholar | 12878234PubMed |
Santner, T. J., 1998, Teaching Statistics, 20, 20–23
| Crossref | GoogleScholarGoogle Scholar |
van den Bergh, S., 2002, AJ, 124, 782
| Crossref | GoogleScholarGoogle Scholar |
Vollset, S. E., 1993, Statistics in Medicine, 12, 809
| Crossref | GoogleScholarGoogle Scholar | 1:STN:280:DyaK3szhvVSiug%3D%3D&md5=4a6f98f79cd241216088dc80e1310ed5CAS | 8327801PubMed |
Wald, A. and Wolfowitz, J., 1939, The Annals of Mathematical Statistics, 10, 105
| Crossref | GoogleScholarGoogle Scholar |