Register      Login
Publications of the Astronomical Society of Australia Publications of the Astronomical Society of Australia Society
Publications of the Astronomical Society of Australia
RESEARCH ARTICLE (Open Access)

On the Estimation of Confidence Intervals for Binomial Population Proportions in Astronomy: The Simplicity and Superiority of the Bayesian Approach

Ewan Cameron
+ Author Affiliations
- Author Affiliations

A Department of Physics, Swiss Federal Institute of Technology (ETH Zurich), CH-8093 Zurich, Switzerland.

B Email: cameron@phys.ethz.ch

Publications of the Astronomical Society of Australia 28(2) 128-139 https://doi.org/10.1071/AS10046
Submitted: 03 December 10  Accepted: 01 March 2011   Published: 16 June 2011

Journal Compilation © Astronomical Society of Australia 2011

Abstract

I present a critical review of techniques for estimating confidence intervals on binomial population proportions inferred from success counts in small to intermediate samples. Population proportions arise frequently as quantities of interest in astronomical research; for instance, in studies aiming to constrain the bar fraction, active galactic nucleus fraction, supermassive black hole fraction, merger fraction, or red sequence fraction from counts of galaxies exhibiting distinct morphological features or stellar populations. However, two of the most widely-used techniques for estimating binomial confidence intervals — the ‘normal approximation’ and the Clopper & Pearson approach — are liable to misrepresent the degree of statistical uncertainty present under sampling conditions routinely encountered in astronomical surveys, leading to an ineffective use of the experimental data (and, worse, an inefficient use of the resources expended in obtaining that data). Hence, I provide here an overview of the fundamentals of binomial statistics with two principal aims: (i) to reveal the ease with which (Bayesian) binomial confidence intervals with more satisfactory behaviour may be estimated from the quantiles of the beta distribution using modern mathematical software packages (e.g. r, matlab, mathematica, idl, python); and (ii) to demonstrate convincingly the major flaws of both the ‘normal approximation’ and the Clopper & Pearson approach for error estimation.

Keywords: methods: data analysis — methods: statistical


References

Agresti, A. and Coull, B. A., 1998, The American Statistician, 52, 119
Crossref | GoogleScholarGoogle Scholar |

Baldry, I. K., Balogh, M. L., Bower, R. G., Glazebrook, K., Nicol, R. C., Bamford, S. P. and Budavari, T., 2006, MNRAS, 373, 469
Crossref | GoogleScholarGoogle Scholar |

Burgasser, A. J., Kirkpatrick, J. D., Reid, N. I., Brown, M. E., Miskey, C. L. and Gizis, J. E., 2003, ApJ, 586, 512
Crossref | GoogleScholarGoogle Scholar |

Brown, L. D., Cai, T. T. and DasGupta, A., 2001, Statistical Science, 16, 101

Brown, L. D., Cai, T. T. and DasGupta, A., 2002, The Annals of Statistics, 30, 160
Crossref | GoogleScholarGoogle Scholar |

Cameron, E. et al., 2010, MNRAS, 409, 346
Crossref | GoogleScholarGoogle Scholar |

Clopper, C. J. and Pearson, E. S., 1934, Biometrika, 26, 404
Crossref | GoogleScholarGoogle Scholar |

Conselice, C. J., Rajgor, S. and Myers, R., 2008, , , 909

Cousins, R. D., Hymes, K. E. and Tucker, T., 2009, NIM, 612, 388
Crossref | GoogleScholarGoogle Scholar |

De Propris, R., Liske, J., Driver, S. P., Allen, P. D. and Cross, N. J. G., 2005, ApJ, 130, 1516

Elmegreen, D. M., Elmegreen, B. G. and Bellin, A. D., 1990, ApJ, 364, 415
Crossref | GoogleScholarGoogle Scholar |

Gehrels, N., 1986, ApJ, 303, 336
Crossref | GoogleScholarGoogle Scholar | 1:CAS:528:DyaL28XhvFKgsbY%3D&md5=3e027fefcff229d2c5619efab98bd7bdCAS |

Gelman A. , Carlin J. B. , Stern H. S. & Rubin D. B. , 2003, Bayesian Data Analysis, (New York: Chapman & Hall)

Hester, J. A., 2010, ApJ, 720, 191
Crossref | GoogleScholarGoogle Scholar | 1:CAS:528:DC%2BC3cXhtF2kt7zK&md5=9de365d714c93872907a1fbc3439a5a8CAS |

Ilbert, O. et al., 2010, ApJ, 709, 644
Crossref | GoogleScholarGoogle Scholar | 1:CAS:528:DC%2BC3cXisFCht7w%3D&md5=c8e25b9fc83ec7662b88c6a2551af477CAS |

Kraft, R. P., Burrows, D. N. and Nousek, J. A., 1991, ApJ, 374, 344
Crossref | GoogleScholarGoogle Scholar |

Quirin W. L. , 1978, Probability and Statistics (New York: Harper & Row Publishers)

López-Sanjuan, C., Balcells, M., Pérez-González, P. G., Barro, G., Gallego, J. and Zamorano, J., 2010, A&A, 518, 20
Crossref | GoogleScholarGoogle Scholar |

Nair, P. B. and Abraham, R. G., 2010, ApJL, 714, L260
Crossref | GoogleScholarGoogle Scholar |

Neyman, J., 1935, The Annals of Mathematical Statistics, 6, 111
Crossref | GoogleScholarGoogle Scholar |

Rao, M. M. and Swift, R. J., 2006, Mathematics and Its Applications, , 582

Ross, T. D., 2003, Computers in Biology and Medicine, 33, 509
Crossref | GoogleScholarGoogle Scholar | 12878234PubMed |

Santner, T. J., 1998, Teaching Statistics, 20, 20–23
Crossref | GoogleScholarGoogle Scholar |

van den Bergh, S., 2002, AJ, 124, 782
Crossref | GoogleScholarGoogle Scholar |

Vollset, S. E., 1993, Statistics in Medicine, 12, 809
Crossref | GoogleScholarGoogle Scholar | 1:STN:280:DyaK3szhvVSiug%3D%3D&md5=4a6f98f79cd241216088dc80e1310ed5CAS | 8327801PubMed |

Wald, A. and Wolfowitz, J., 1939, The Annals of Mathematical Statistics, 10, 105
Crossref | GoogleScholarGoogle Scholar |