research-article

Free Access

Deep hurdle networks for zero-inflated multi-target regression: application to multiple species abundance estimation

Authors:
Shufeng Kong

Department of Computer Science, Cornell University

Department of Computer Science, Cornell University
Search about this author

,
Junwen Bai

Department of Computer Science, Cornell University

Department of Computer Science, Cornell University
Search about this author

,
Jae Hee Lee

School of Computer Science and Informatics, Cardiff University, UK

School of Computer Science and Informatics, Cardiff University, UK
Search about this author

,
Di Chen

Department of Computer Science, Cornell University

Department of Computer Science, Cornell University
Search about this author

,
Andrew Allyn

Gulf of Maine Research Institute

Gulf of Maine Research Institute
Search about this author

,
Michelle Stuart

Department of Ecology, Evolution, and Natural Resources, Rutgers University

Department of Ecology, Evolution, and Natural Resources, Rutgers University
Search about this author

,
Malin Pinsky

Department of Ecology, Evolution, and Natural Resources, Rutgers University

Department of Ecology, Evolution, and Natural Resources, Rutgers University
Search about this author

,
Katherine Mills

Gulf of Maine Research Institute

Gulf of Maine Research Institute
Search about this author

,
Carla P. Gomes

Department of Computer Science, Cornell University

Department of Computer Science, Cornell University
Search about this author

IJCAI'20: Proceedings of the Twenty-Ninth International Joint Conference on Artificial IntelligenceJanuary 2021 Article No.: 603Pages 4375–4381

Published:07 January 2021Publication History

IJCAI'20: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence

Pages 4375–4381

ABSTRACT

A key problem in computational sustainability is to understand the distribution of species across landscapes over time. This question gives rise to challenging large-scale prediction problems since (i) hundreds of species have to be simultaneously modeled and (ii) the survey data are usually inflated with zeros due to the absence of species for a large number of sites. The problem of tackling both issues simultaneously, which we refer to as the zero-inflated multi-target regression problem, has not been addressed by previous methods in statistics and machine learning. In this paper, we propose a novel deep model for the zero-inflated multi-target regression problem. To this end, we first model the joint distribution of multiple response variables as a multivariate probit model and then couple the positive outcomes with a multivariate log-normal distribution. By penalizing the difference between the two distributions' covariance matrices, a link between both distributions is established. The whole model is cast as an end-to-end learning framework and we provide an efficient learning algorithm for our model that can be fully implemented on GPUs. We show that our model outperforms the existing state-of-the-art baselines on two challenging real-world species distribution datasets concerning bird and fish populations.

References

Afshin Almasi, Mohammad Reza Eshraghian, Abbas Moghimbeigi, Abbas Rahimi, Kazem Mohammad, and Sadegh Fallahigilan. Multilevel zero-inflated generalized poisson regression modeling for dispersed correlated count data. Statistical Methodology, 30:1- 14, 2016.Google Scholar
Hanen Borchani, Gherardo Varando, Concha Bielza, and Pedro Larrañaga. A survey on multi-output regression. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 5(5):216-233, 2015.Google Scholar
Eduardo S. Brondizio, Josef Settele, Sandra Díaz, and Hien T. Ngo. Global assessment report on biodiversity and ecosystem services of the intergovernmental science-policy platform on biodiversity and ecosystem services. IPBES Secretariat, 2019.Google Scholar
James A. Carton, Gennady A. Chepurin, and Ligang Chen. Soda3: A new ocean climate reanalysis. Journal of Climate, 31(17):6967-6983, 2018.Google Scholar
Di Chen, Yexiang Xue, and Carla Gomes. End-to-end learning for the deep multivariate probit model. In International Conference on Machine Learning, pages 931-940, 2018.Google Scholar
Carla Gomes, Thomas Dietterich, Christopher Barrett, Jon Conrãd, Bistra Dilkina, Stefano Ermon, Fei Fang, Andrew Farnsworth, Alan Fern, Xiaoli Fern, et al. Computational sustainability: Computing for a better world and a susta inable future. Communications of the ACM, 62(9):56-65, 2019.Google Scholar
Collin Homer, Jon Dewitz, Limin Yang, Suming Jin, Patrick Danielson, George Xian, John Coulston, Nathaniel Herold, James Wickham, and Kevin Megown. Completion of the 2011 national land cover database for the conterminous united states-representing a decade of land cover change information. Photogrammetric Engineering & Remote Sensing, 81(5):345-354, 2015.Google Scholar
Dragi Kocev, Celine Vens, Jan Struyf, and Sa?so D?zeroski. Ensembles of multi-objective decision trees. In European conference on machine learning, pages 624-631, 2007.Google Scholar
Moshe Lichman and Padhraic Smyth. Prediction of sparse user-item consumption rates with zero-inflated poisson regression. In Proceedings of the 2018 World Wide Web Conference, pages 719-728, 2018.Google Scholar
Douglas J. McCauley, Malin L. Pinsky, Stephen R. Palumbi, James A. Estes, Francis H. Joyce, and Robert R. Warner. Marine defaunation: animal loss in the global ocean. Science, 347(6219):1255641-1255641, 2015.Google Scholar
Gabriella Melki, Alberto Cano, Vojislav Kecman, and Sebastián Ventura. Multi-target support vector regression via correlation regressor chains. Information Sciences, 415:53-69, 2017.Google Scholar
James W. Morley, Rebecca L. Selden, Robert J. Latour, Thomas L. Frölicher, Richard J. Seagraves, and Malin L. Pinsky. Projecting shifts in thermal habitat for 686 species on the north american continental shelf. PLOS ONE, 13(5):1-28, 2018.Google Scholar
John Mullahy. Specification and testing of some modified count data models. Journal of econometrics, 33(3):341-365, 1986.Google Scholar
M Arthur Munson, Kevin Webb, Daniel Sheldon, Daniel Fink, Wesley M Hochachka, Marshall Iliff, Mirek Riedewald, Daria Sorokina, Brian Sullivan, Christopher Wood, and Steve Kelling. The ebird reference dataset, version 4.0. Cornell Lab of Ornithology and National Audubon Society, Ithaca, NY, 2012.Google Scholar
Malin L. Pinsky, Rebecca L. Selden, and Zoë J. Kitchel. Climate-driven shifts in marine species ranges: Scaling from organisms to communities. Annual Review of Marine Science, 12:153-179, 2020.Google Scholar
Charles E Rose, Stacey W Martin, Kathleen A Wannemuehler, and Brian D Plikaytis. On the use of zero-inflated and hurdle models for modeling vaccine adverse event count data. Journal of biopharmaceutical statistics, 16(4):463-481, 2006.Google Scholar
Kenneth V. Rosenberg, Adriaan M. Dokter, Peter J. Blancher, John R. Sauer, Adam C. Smith, Paul A. Smith, Jessica C. Stanton, Arvind Panjabi, Laura Helft, Michael Parr, and Peter P. Marra. Decline of the north american avifauna. Science, 366(6461):120-124, 2019.Google Scholar
Eleftherios Spyromitros-Xioufis, Grigorios Tsoumakas, William Groves, and Ioannis Vlahavas. Multi-target regression via input space expansion: treating targets as inputs. Machine Learning, 104(1):55-98, 2016.Google Scholar
Grigorios Tsoumakas, Eleftherios Spyromitros-Xioufis, Aikaterini Vrekou, and Ioannis Vlahavas. Multi-target regression via random linear target combinations. In Joint european conference on machine learning and knowledge discovery in databases, pages 225- 240, 2014.Google Scholar
Xuefeng Xi, Victor S Sheng, Binqi Sun, Lei Wang, and Fuyuan Hu. An empirical comparison on multitarget regression learning. Computers, Materials & Continua, 56(2):185-198, 2018.Google Scholar
Donna Xu, Yaxin Shi, Ivor W. Tsang, Yew-Soon Ong, Chen Gong, and Xiaobo Shen. A survey on multi-output learning. IEEE Transactions on Neural Networks and Learning Systems, pages 1-21, 2019.Google Scholar
Xiantong Zhen, Mengyang Yu, Xiaofei He, and Shuo Li. Multi-target regression via robust lowrank learning. IEEE transactions on pattern analysis and machine intelligence, 40(2):497-504, 2017.Google Scholar
Xinqi Zhu and Zhenghong Gao. An efficient gradient-based model selection algorithm for multi-output least-squares support vector regression machines. Pattern Recognition Letters, 111:16-22, 2018.Google Scholar

Index Terms

(auto-classified)

Deep hurdle networks for zero-inflated multi-target regression: application to multiple species abundance estimation
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Information systems
  1. Information systems applications

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
IJCAI'20: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence
January 2021
5311 pages
ISBN:9780999241165
Editor:
Christian Bessiere
Copyright © 2020 International Joint Conferences on Artificial Intelligence
Sponsors
In-Cooperation
Publisher
Unknown publishers
Publication History
- Published: 7 January 2021
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 12
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Deep hurdle networks for zero-inflated multi-target regression: application to multiple species abundance estimation

Save to Binder

IJCAI'20: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence

ABSTRACT

References

Cited By

Index Terms

Deep hurdle networks for zero-inflated multi-target regression: application to multiple species abundance estimation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Deep hurdle networks for zero-inflated multi-target regression: application to multiple species abundance estimation

Save to Binder

IJCAI'20: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence

ABSTRACT

References

Cited By

Index Terms

Deep hurdle networks for zero-inflated multi-target regression: application to multiple species abundance estimation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media