Comparing calibrated statistical and machine learning methods for wildland fire occurrence prediction: a case study of human-caused fires in Lac La Biche, Alberta, Canada
Nathan Phelps A B and Douglas G. Woolford A CA Department of Statistical and Actuarial Sciences, University of Western Ontario, London N6A 3K7, Canada.
B Department of Computer Science, University of Western Ontario, London N6A 3K7, Canada.
C Corresponding author. Email: dwoolfor@uwo.ca
International Journal of Wildland Fire 30(11) 850-870 https://doi.org/10.1071/WF20139
Submitted: 1 September 2020 Accepted: 24 August 2021 Published: 29 September 2021
Journal Compilation © IAWF 2021 Open Access CC BY-NC-ND
Abstract
Wildland fire occurrence prediction (FOP) modelling supports fire management decisions, such as suppression resource pre-positioning and the routeing of detection patrols. Common empirical modelling methods for FOP include both model-based (statistical modelling) and algorithmic-based (machine learning) approaches. However, it was recently shown that many machine learning models in FOP literature are not suitable for fire management operations because of overprediction if not properly calibrated to output true probabilities. We present methods for properly calibrating statistical and machine learning models for fine-scale, spatially explicit daily FOP followed by a case-study comparison of human-caused FOP modelling in the Lac La Biche region of Alberta, Canada, using data from 1996 to 2016. Calibrated bagged classification trees, random forests, neural networks, logistic regression models and logistic generalised additive models (GAMs) are compared in order to assess the pros and cons of these approaches when properly calibrated. Results suggest that logistic GAMs can have similar performance to machine learning models for FOP. Hence, we advocate that the pros and cons of different modelling approaches should be discussed with fire management practitioners when determining which models to use operationally because statistical methods are commonly viewed as more interpretable than machine learning methods.
Keywords: artificial intelligence, classification, ensemble, forest fire occurrence prediction, generalised additive model, human-caused, supervised learning.
References
Alkronz ES, Moghayer KA, Meimeh M, Gazzaz M, Abu-Nasser BS, Abu-Naser SS (2019) Prediction of whether mushroom is edible or poisonous using back-propagation neural network. International Journal of Academic and Applied Research 3, 1–8.Allaire JJ, Chollet F (2020) keras: R Interface to ‘Keras’. R package version 2.3.0.0. Available at https://CRAN.R-project.org/package=keras
Alonso-Betanzos A, Fontenla-Romero O, Guijarro-Berdiñas B, Hernández-Pereira E, Andrade MIP, Jiménez E, Soto JLL, Carballas T (2003) An intelligent system for forest fire risk prediction and firefighting management in Galicia. Expert Systems with Applications 25, 545–554.
| An intelligent system for forest fire risk prediction and firefighting management in Galicia.Crossref | GoogleScholarGoogle Scholar |
Bar Massada A, Syphard AD, Stewart SI, Radeloff VC (2013) Wildfire ignition-distribution modelling: a comparative study in the Huron–Manistee National Forest, Michigan, USA. International Journal of Wildland Fire 22, 174–183.
| Wildfire ignition-distribution modelling: a comparative study in the Huron–Manistee National Forest, Michigan, USA.Crossref | GoogleScholarGoogle Scholar |
Boyd K, Eng KH, Page CD (2013) Area under the precision-recall curve: point estimates and confidence intervals. In ‘Joint European conference on machine learning and knowledge discovery in databases’. (Eds H Blockeel, K Kersting, S Nijssen, F Železný) pp. 451–466. (Springer: Berlin, Heidelberg)
Breiman L (1996) Bagging predictors. Machine Learning 24, 123–140.
| Bagging predictors.Crossref | GoogleScholarGoogle Scholar |
Breiman L (2001a) Random forests. Machine Learning 45, 5–32.
| Random forests.Crossref | GoogleScholarGoogle Scholar |
Breiman L (2001b) Statistical modeling: The two cultures (with comments and a rejoinder by the author). Statistical Science 16, 199–231.
| Statistical modeling: The two cultures (with comments and a rejoinder by the author).Crossref | GoogleScholarGoogle Scholar |
Breiman L, Friedman J, Oshen R, Stone C (1984) ‘Classification and regression trees’. (Wadsworth and Brooks: Monterey, CA)
Brillinger DR, Preisler HK, Benoit JW (2003) Risk assessment: a forest fire example. Lecture Notes-Monograph Series / Institute of Mathematical Statistics 40, 177–196.
| Risk assessment: a forest fire example.Crossref | GoogleScholarGoogle Scholar |
Brillinger DR, Preisler HK, Benoit JW (2006) Probabilistic risk assessment for wildfires. Environmetrics 17, 623–633.
| Probabilistic risk assessment for wildfires.Crossref | GoogleScholarGoogle Scholar |
Canadian Council of Forest Ministers Wildland Fire Management Working Group (2016) Canadian Wildland Fire Strategy: A 10-year review and renewed call to action. Natural Resources Canada report Fo79–22/2016E-PDF.
Chen C, Liaw A, Breiman L (2004) Using random forest to learn imbalanced data. University of California, Berkeley, Department of Statistics Report 666. Available at https://statistics.berkeley.edu/sites/default/files/tech-reports/666.pdf
Collins E, Ghosh S, Scofield C (1988) An application of a multiple neural network learning system to emulation of mortgage underwriting judgments. In ‘Proceedings of the IEEE International Conference on Neural Networks’. pp. 459–466. (IEEE)
Costafreda-Aumedes S, Comas C, Vega-Garcia C (2017) Human-caused fire occurrence modelling in perspective: a review. International Journal of Wildland Fire 26, 983–998.
| Human-caused fire occurrence modelling in perspective: a review.Crossref | GoogleScholarGoogle Scholar |
Cunningham AA, Martell DL (1973) A stochastic model for the occurrence of man-caused forest fires. Canadian Journal of Forest Research 3, 282–287.
| A stochastic model for the occurrence of man-caused forest fires.Crossref | GoogleScholarGoogle Scholar |
Dal Pozzolo A, Caelen O, Johnson RA, Bontempi G (2015) Calibrating probability with undersampling for unbalanced classification. In ‘2015 IEEE symposium series on computational intelligence’. pp. 159–166. (IEEE)
Dutta S, Shekhar S (1988) Bond rating: A non-conservative application of neural networks. In ‘Proceedings of the IEEE international conference on neural networks’. pp. 443–450. (IEEE)
Flannigan MD, Wotton BM (1989) A study of interpolation methods for forest fire danger rating in Canada. Canadian Journal of Forest Research 19, 1059–1066.
| A study of interpolation methods for forest fire danger rating in Canada.Crossref | GoogleScholarGoogle Scholar |
Goodfellow I, Benjio Y, Courville A (2016) ‘Deep learning’. (MIT Press) Available at http://www.deeplearningbook.org
Government of Alberta (2018) Agriculture and Forestry Annual Report 2017–18. (Government of Alberta: Edmonton, AB, Canada)
Grau J, Grosse I, Keilwagen J (2015) PRROC: computing and visualizing precision-recall and receiver operating characteristic curves in R. Bioinformatics 31, 2595–2597.
| PRROC: computing and visualizing precision-recall and receiver operating characteristic curves in R.Crossref | GoogleScholarGoogle Scholar | 25810428PubMed |
He H, Garcia EA (2009) Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering 21, 1263–1284.
| Learning from imbalanced data.Crossref | GoogleScholarGoogle Scholar |
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ‘Proceedings of the 32nd international conference on machine learning’. (Eds F Bach, D Blei) Volume 37, pp. 448–456. (Proceedings of Machine Learning Research) Available at https://proceedings.mlr.press/v37/ioffe15.html
Jain BA, Nag BN (1997) Performance evaluation of neural network decision models. Journal of Management Information Systems 14, 201–216.
| Performance evaluation of neural network decision models.Crossref | GoogleScholarGoogle Scholar |
Jain P, Coogan SC, Subramanian SG, Crowley M, Taylor S, Flannigan MD (2020) A review of machine learning applications in wildfire science and management. Environmental Reviews 28, 478–505.
| A review of machine learning applications in wildfire science and management.Crossref | GoogleScholarGoogle Scholar |
James G, Witten D, Hastie T, Tibshirani R (2013) ‘An introduction to statistical learning with applications in R.’ (Springer: New York)
Johnston LM, Flannigan MD (2018) Mapping Canadian wildland fire interface areas. International Journal of Wildland Fire 27, 1–14.
| Mapping Canadian wildland fire interface areas.Crossref | GoogleScholarGoogle Scholar |
Johnston LM, Wang X, Erni S, Taylor SW, McFayden CB, Oliver JA, Stockdale C, Christianson A, Boulanger Y, Gauthier S, Arseneault D, Wotton BM, Parisien M-A, Flannigan MD (2020) Wildland fire risk research in Canada. Environmental Reviews 999, 1–23.
Keilwagen J, Grosse I, Grau J (2014) Area under precision-recall curves for weighted and unweighted data. PLoS One 9, e92209
| Area under precision-recall curves for weighted and unweighted data.Crossref | GoogleScholarGoogle Scholar | 24651729PubMed |
Kingma DP, Ba J (2015) Adam: A method for stochastic optimization. Poster presentation in ‘Proceedings of the 3rd international conference on learning representations’. (DBLP: computer science bibliography) Available at https://dblp.org/db/conf/iclr/iclr2015.html
Klimasauskas CC (1988) ‘NeuralWorks™: An introduction to neural computing’. (NeuralWare: Sewickley, PA)
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In ‘NeurIPS Proceedings: Advances in neural information processing systems 25’. (Eds F Pereira, CJC Burgess, L Bottou, KQ Weinberger) pp. 1097–1105. (Curran Associates Inc.: Red Hook, NY, USA) Available at https://papers.nips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf
Lawrence S, Giles CL, Tsoi AC, Back AD (1997) Face recognition: A convolutional neural-network approach. IEEE Transactions on Neural Networks 8, 98–113.
| Face recognition: A convolutional neural-network approach.Crossref | GoogleScholarGoogle Scholar | 18255614PubMed |
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521, 436–444.
| Deep learning.Crossref | GoogleScholarGoogle Scholar | 26017442PubMed |
Liaw A, Wiener M (2002) Classification and regression by randomForest. R News 2, 18–22.
Magnussen S, Taylor SW (2012) Prediction of daily lightning- and human-caused fires in British Columbia. International Journal of Wildland Fire 21, 342–356.
| Prediction of daily lightning- and human-caused fires in British Columbia.Crossref | GoogleScholarGoogle Scholar |
Martell DL (2007) Forest fire management: current practices and new challenges for operational researchers. In ‘Handbook of operations research in natural resources’. (Eds A Weintraub, C Romero, T Bjørndal, R Epstein). pp. 489–509. (Springer)
Martell DL, Otukol S, Stocks BJ (1987) A logistic model for predicting daily people-caused forest fire occurrence in Ontario. Canadian Journal of Forest Research 17, 394–401.
| A logistic model for predicting daily people-caused forest fire occurrence in Ontario.Crossref | GoogleScholarGoogle Scholar |
Martell DL, Bevilacqua E, Stocks BJ (1989) Modelling seasonal variation in daily people-caused forest fire occurrence. Canadian Journal of Forest Research 19, 1555–1563.
| Modelling seasonal variation in daily people-caused forest fire occurrence.Crossref | GoogleScholarGoogle Scholar |
Mason C, Twomey J, Wright D, Whitman L (2018) Predicting engineering student attrition risk using a probabilistic neural network and comparing results with a backpropagation neural network and logistic regression. Research in Higher Education 59, 382–400.
| Predicting engineering student attrition risk using a probabilistic neural network and comparing results with a backpropagation neural network and logistic regression.Crossref | GoogleScholarGoogle Scholar |
McCulloch WS, Pitts W (1943) A logical calculus of the ideas immanent in nervous activity. The Bulletin of Mathematical Biophysics 5, 115–133.
| A logical calculus of the ideas immanent in nervous activity.Crossref | GoogleScholarGoogle Scholar |
McFayden CB, Woolford DG, Stacey A, Boychuk D, Johnston JM, Wheatley MJ, Martell DL (2020) Risk assessment for wildland fire aerial detection patrol route planning in Ontario, Canada. International Journal of Wildland Fire 29, 28–41.
| Risk assessment for wildland fire aerial detection patrol route planning in Ontario, Canada.Crossref | GoogleScholarGoogle Scholar |
Merkle EC, Steyvers M (2013) Choosing a strictly proper scoring rule. Decision Analysis 10, 292–304.
| Choosing a strictly proper scoring rule.Crossref | GoogleScholarGoogle Scholar |
Mikolov T, Karafiát M, Burget L, Černocký J, Khudanpur S (2010) Recurrent neural network-based language model. In ‘INTERSPEECH 2010, 11th Annual conference of the International Speech Communication Association, Makuhari, Chiba, Japan, September 26–30, 2010’. pp. 1045–1048
Nadeem K, Taylor SW, Woolford DG, Dean CB (2020) Mesoscale spatio-temporal predictive models of daily human and lightning-caused wildland fire occurrence in British Columbia. International Journal of Wildland Fire 29, 11–27.
| Mesoscale spatio-temporal predictive models of daily human and lightning-caused wildland fire occurrence in British Columbia.Crossref | GoogleScholarGoogle Scholar |
Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. In ‘ICML’10: Proceedings of the 27th international conference on machine learning’. (Eds J Furnkranz, T Joachims) pp. 807–814. (Omnipress: Madison, WI, USA)
Niculescu-Mizil A, Caruana R (2005) Predicting good probabilities with supervised learning. In ‘ICML’05: Proceedings of the 22nd international conference on machine learning’. (Eds S Dzeroski, L De Raedt, S Wrobel) pp. 625–632. (Association for Computing Machinery: New York, NJ, USA)
Phelps N, Woolford DG (2021) Guidelines for effective evaluation and comparison of wildland fire occurrence prediction models. International Journal of Wildland Fire 30, 225–240.
| Guidelines for effective evaluation and comparison of wildland fire occurrence prediction models.Crossref | GoogleScholarGoogle Scholar |
Platt J (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in Large Margin Classifiers 10, 61–74.
Plucinski MP (2012) A review of wildfire occurrence research. Bushfire Cooperative Research Centre. (Melbourne, Vic., Australia). Available at https://www.bushfirecrc.com/sites/default/files/managed/resource/attachment_g_fire_occurrence_literature_review_0.pdf
Prechelt L (1998) Early stopping – but when? In ‘Neural networks: Tricks of the trade’. (Eds G Montavon, GB Orr, KR Müller) pp. 55–69. (Springer: Berlin, Heidelberg) https://DOI.ORG/10.1007/978-3-642-35289-8_5
Preisler HK, Brillinger DR, Burgan RE, Benoit JW (2004) Probability based models for estimation of wildfire risk. International Journal of Wildland Fire 13, 133–142.
| Probability based models for estimation of wildfire risk.Crossref | GoogleScholarGoogle Scholar |
R Core Team (2017) R: A language and environment for statistical computing. (R Foundation for Statistical Computing: Vienna, Austria). Available at https://www.R-project.org/
Rodrigues M, de la Riva J (2014) An insight into machine-learning algorithms to model human-caused wildfire occurrence. Environmental Modelling & Software 57, 192–201.
| An insight into machine-learning algorithms to model human-caused wildfire occurrence.Crossref | GoogleScholarGoogle Scholar |
Rumelhart DE, Hinton GE, Williams RJ (1985) Learning internal representations by error propagation (no. ICS-8506). California University San Diego La Jolla Institute for Cognitive Science.
Sak H, Senior AW, Beaufays F (2014) Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In ‘INTERSPEECH-2014’. pp. 338–342.
Sakr GE, Elhajj IH, Mitri G, Wejinya UC (2010) Artificial intelligence for forest fire prediction. In ‘2010 IEEE/ASME international conference on advanced intelligent mechatronics’. (Eds) pp. 1311–1316. (IEEE)
Sakr GE, Elhajj IH, Mitri G (2011) Efficient forest fire occurrence prediction for developing countries using two weather parameters. Engineering Applications of Artificial Intelligence 24, 888–894.
| Efficient forest fire occurrence prediction for developing countries using two weather parameters.Crossref | GoogleScholarGoogle Scholar |
Salchenberger LM, Cinar EM, Lash NA (1992) Neural networks: A new tool for predicting thrift failures. Decision Sciences 23, 899–916.
| Neural networks: A new tool for predicting thrift failures.Crossref | GoogleScholarGoogle Scholar |
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15, 1929–1958.
Stocks BJ (2013) Evaluating past, current and future fire load trends in Canada. Canadian Interagency Forest Fire Centre. (Winnipeg, MB, Canada)
Stocks BJ, Lynham TJ, Lawson BD, Alexander ME, Wagner CV, McAlpine RS, Dube DE (1989) Canadian Forest Fire Danger Rating System: An Overview. The Forestry Chronicle 65, 258–265.
Stojanova D, Panov P, Kobler A, Dzeroski S, Taskova K (2006) Learning to predict forest fires with different data mining techniques. In ‘Conference on data mining and data warehouses’. pp. 255–258. (SiKDD: Ljubljana, Slovenia)
Stojanova D, Kobler A, Ogrinc P, Ženko B, Džeroski S (2012) Estimating the risk of fire outbreaks in the natural environment. Data Mining and Knowledge Discovery 24, 411–442.
| Estimating the risk of fire outbreaks in the natural environment.Crossref | GoogleScholarGoogle Scholar |
Taylor SW, Woolford DG, Dean CB, Martell DL (2013) Wildfire prediction to inform management: Statistical science challenges. Statistical Science 28, 586–615.
| Wildfire prediction to inform management: Statistical science challenges.Crossref | GoogleScholarGoogle Scholar |
Tibshirani R (1996) Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B. Methodological 58, 267–288.
| Regression shrinkage and selection via the lasso.Crossref | GoogleScholarGoogle Scholar |
Turner R (2009) Point patterns of forest fire locations. Environmental and Ecological Statistics 16, 197–223.
| Point patterns of forest fire locations.Crossref | GoogleScholarGoogle Scholar |
Van Beusekom AE, Gould WA, Monmany AC, Khalyani AH, Quiñones M, Fain SJ, Andrade-Núñez MJ, González G (2018) Fire weather and likelihood: characterizing climate space for fire occurrence and extent in Puerto Rico. Climatic Change 146, 117–131.
| Fire weather and likelihood: characterizing climate space for fire occurrence and extent in Puerto Rico.Crossref | GoogleScholarGoogle Scholar |
Van Wagner CE (1987) Development and structure of the Canadian Forest Fire Weather Index System. Canadian Forestry Service. (Ottawa, ON, Canada)
Vasconcelos MJP, Silva S, Tome M, Alvim M, Pereira JC (2001) Spatial prediction of fire ignition probabilities: comparing logistic regression and neural networks. Photogrammetric Engineering and Remote Sensing 67, 73–81.
Vega-Garcia C, Woodard PM, Titus SJ, Adamowicz WL, Lee BS (1995) A logit model for predicting the daily occurrence of human caused forest-fires. International Journal of Wildland Fire 5, 101–111.
| A logit model for predicting the daily occurrence of human caused forest-fires.Crossref | GoogleScholarGoogle Scholar |
Vega-Garcia C, Lee BS, Woodard PM, Titus SJ (1996) Applying neural network technology to human-caused wildfire occurrence prediction. AI Applications 10, 9–18.
Vilar L, Woolford DG, Martell DL, Martín MP (2010) A model for predicting human-caused wildfire occurrence in the region of Madrid, Spain. International Journal of Wildland Fire 19, 325–337.
| A model for predicting human-caused wildfire occurrence in the region of Madrid, Spain.Crossref | GoogleScholarGoogle Scholar |
Wang X, Wotton BM, Cantin AS, Parisien MA, Anderson K, Moore B, Flannigan MD (2017) cffdrs: an R package for the Canadian forest fire danger rating system. Ecological Processes 6, 5
| cffdrs: an R package for the Canadian forest fire danger rating system.Crossref | GoogleScholarGoogle Scholar |
Wood SN (2011) Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. Journal of the Royal Statistical Society. Series B, Statistical Methodology 73, 3–36.
| Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models.Crossref | GoogleScholarGoogle Scholar |
Wood SN (2017) ‘Generalized additive models: An introduction with R (2nd edn)’. (Chapman and Hall/CRC)
Woolford DG, Bellhouse DR, Braun WJ, Dean CB, Martell DL, Sun J (2011) A spatio-temporal model for people-caused forest fire occurrence in the Romeo Malette Forest. Journal of Environmental Statistics 2, 2–16.
Woolford DG, Martell DL, McFayden CB, Evens J, Stacey A, Wotton BM, Boychuk D (2021) The development and implementation of a human-caused wildland fire occurrence prediction system for the province of Ontario, Canada. Canadian Journal of Forest Research 51, 303–325.
| The development and implementation of a human-caused wildland fire occurrence prediction system for the province of Ontario, Canada.Crossref | GoogleScholarGoogle Scholar |
Wotton BM (2009) Interpreting and using outputs from the Canadian Forest Fire Danger Rating System in research applications. Environmental and Ecological Statistics 16, 107–131.
| Interpreting and using outputs from the Canadian Forest Fire Danger Rating System in research applications.Crossref | GoogleScholarGoogle Scholar |
Wotton BM, Martell DL (2005) A lightning fire occurrence model for Ontario. Canadian Journal of Forest Research 35, 1389–1401.
| A lightning fire occurrence model for Ontario.Crossref | GoogleScholarGoogle Scholar |
Xi DD, Taylor SW, Woolford DG, Dean CB (2019) Statistical models of key components of wildfire risk. Annual Review of Statistics and Its Application 6, 197–222.
| Statistical models of key components of wildfire risk.Crossref | GoogleScholarGoogle Scholar |