ABSTRACT
A key problem in computational sustainability is to understand the distribution of species across landscapes over time. This question gives rise to challenging large-scale prediction problems since (i) hundreds of species have to be simultaneously modeled and (ii) the survey data are usually inflated with zeros due to the absence of species for a large number of sites. The problem of tackling both issues simultaneously, which we refer to as the zero-inflated multi-target regression problem, has not been addressed by previous methods in statistics and machine learning. In this paper, we propose a novel deep model for the zero-inflated multi-target regression problem. To this end, we first model the joint distribution of multiple response variables as a multivariate probit model and then couple the positive outcomes with a multivariate log-normal distribution. By penalizing the difference between the two distributions' covariance matrices, a link between both distributions is established. The whole model is cast as an end-to-end learning framework and we provide an efficient learning algorithm for our model that can be fully implemented on GPUs. We show that our model outperforms the existing state-of-the-art baselines on two challenging real-world species distribution datasets concerning bird and fish populations.
- Afshin Almasi, Mohammad Reza Eshraghian, Abbas Moghimbeigi, Abbas Rahimi, Kazem Mohammad, and Sadegh Fallahigilan. Multilevel zero-inflated generalized poisson regression modeling for dispersed correlated count data. Statistical Methodology, 30:1- 14, 2016.Google Scholar
- Hanen Borchani, Gherardo Varando, Concha Bielza, and Pedro Larrañaga. A survey on multi-output regression. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 5(5):216-233, 2015.Google Scholar
- Eduardo S. Brondizio, Josef Settele, Sandra Díaz, and Hien T. Ngo. Global assessment report on biodiversity and ecosystem services of the intergovernmental science-policy platform on biodiversity and ecosystem services. IPBES Secretariat, 2019.Google Scholar
- James A. Carton, Gennady A. Chepurin, and Ligang Chen. Soda3: A new ocean climate reanalysis. Journal of Climate, 31(17):6967-6983, 2018.Google Scholar
- Di Chen, Yexiang Xue, and Carla Gomes. End-to-end learning for the deep multivariate probit model. In International Conference on Machine Learning, pages 931-940, 2018.Google Scholar
- Carla Gomes, Thomas Dietterich, Christopher Barrett, Jon Conrãd, Bistra Dilkina, Stefano Ermon, Fei Fang, Andrew Farnsworth, Alan Fern, Xiaoli Fern, et al. Computational sustainability: Computing for a better world and a susta inable future. Communications of the ACM, 62(9):56-65, 2019.Google Scholar
- Collin Homer, Jon Dewitz, Limin Yang, Suming Jin, Patrick Danielson, George Xian, John Coulston, Nathaniel Herold, James Wickham, and Kevin Megown. Completion of the 2011 national land cover database for the conterminous united states-representing a decade of land cover change information. Photogrammetric Engineering & Remote Sensing, 81(5):345-354, 2015.Google Scholar
- Dragi Kocev, Celine Vens, Jan Struyf, and Sa?so D?zeroski. Ensembles of multi-objective decision trees. In European conference on machine learning, pages 624-631, 2007.Google Scholar
- Moshe Lichman and Padhraic Smyth. Prediction of sparse user-item consumption rates with zero-inflated poisson regression. In Proceedings of the 2018 World Wide Web Conference, pages 719-728, 2018.Google Scholar
- Douglas J. McCauley, Malin L. Pinsky, Stephen R. Palumbi, James A. Estes, Francis H. Joyce, and Robert R. Warner. Marine defaunation: animal loss in the global ocean. Science, 347(6219):1255641-1255641, 2015.Google Scholar
- Gabriella Melki, Alberto Cano, Vojislav Kecman, and Sebastián Ventura. Multi-target support vector regression via correlation regressor chains. Information Sciences, 415:53-69, 2017.Google Scholar
- James W. Morley, Rebecca L. Selden, Robert J. Latour, Thomas L. Frölicher, Richard J. Seagraves, and Malin L. Pinsky. Projecting shifts in thermal habitat for 686 species on the north american continental shelf. PLOS ONE, 13(5):1-28, 2018.Google Scholar
- John Mullahy. Specification and testing of some modified count data models. Journal of econometrics, 33(3):341-365, 1986.Google Scholar
- M Arthur Munson, Kevin Webb, Daniel Sheldon, Daniel Fink, Wesley M Hochachka, Marshall Iliff, Mirek Riedewald, Daria Sorokina, Brian Sullivan, Christopher Wood, and Steve Kelling. The ebird reference dataset, version 4.0. Cornell Lab of Ornithology and National Audubon Society, Ithaca, NY, 2012.Google Scholar
- Malin L. Pinsky, Rebecca L. Selden, and Zoë J. Kitchel. Climate-driven shifts in marine species ranges: Scaling from organisms to communities. Annual Review of Marine Science, 12:153-179, 2020.Google Scholar
- Charles E Rose, Stacey W Martin, Kathleen A Wannemuehler, and Brian D Plikaytis. On the use of zero-inflated and hurdle models for modeling vaccine adverse event count data. Journal of biopharmaceutical statistics, 16(4):463-481, 2006.Google Scholar
- Kenneth V. Rosenberg, Adriaan M. Dokter, Peter J. Blancher, John R. Sauer, Adam C. Smith, Paul A. Smith, Jessica C. Stanton, Arvind Panjabi, Laura Helft, Michael Parr, and Peter P. Marra. Decline of the north american avifauna. Science, 366(6461):120-124, 2019.Google Scholar
- Eleftherios Spyromitros-Xioufis, Grigorios Tsoumakas, William Groves, and Ioannis Vlahavas. Multi-target regression via input space expansion: treating targets as inputs. Machine Learning, 104(1):55-98, 2016.Google Scholar
- Grigorios Tsoumakas, Eleftherios Spyromitros-Xioufis, Aikaterini Vrekou, and Ioannis Vlahavas. Multi-target regression via random linear target combinations. In Joint european conference on machine learning and knowledge discovery in databases, pages 225- 240, 2014.Google Scholar
- Xuefeng Xi, Victor S Sheng, Binqi Sun, Lei Wang, and Fuyuan Hu. An empirical comparison on multitarget regression learning. Computers, Materials & Continua, 56(2):185-198, 2018.Google Scholar
- Donna Xu, Yaxin Shi, Ivor W. Tsang, Yew-Soon Ong, Chen Gong, and Xiaobo Shen. A survey on multi-output learning. IEEE Transactions on Neural Networks and Learning Systems, pages 1-21, 2019.Google Scholar
- Xiantong Zhen, Mengyang Yu, Xiaofei He, and Shuo Li. Multi-target regression via robust lowrank learning. IEEE transactions on pattern analysis and machine intelligence, 40(2):497-504, 2017.Google Scholar
- Xinqi Zhu and Zhenghong Gao. An efficient gradient-based model selection algorithm for multi-output least-squares support vector regression machines. Pattern Recognition Letters, 111:16-22, 2018.Google Scholar
Index Terms
(auto-classified)Deep hurdle networks for zero-inflated multi-target regression: application to multiple species abundance estimation
Comments