10.5555/3491440.3491744guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article
Free Access

Solving hard AI planning instances using curriculum-driven deep reinforcement learning

Published:07 January 2021Publication History

ABSTRACT

Despite significant progress in general AI planning, certain domains remain out of reach of current AI planning systems. Sokoban is a PSPACE-complete planning task and represents one of the hardest domains for current AI planners. Even domain-specific specialized search methods fail quickly due to the exponential search complexity on hard instances. Our approach based on deep reinforcement learning augmented with a curriculum-driven method is the first one to solve hard instances within one day of training while other modern solvers cannot solve these instances within any reasonable time limit. In contrast to prior efforts, which use carefully handcrafted pruning techniques, our approach automatically uncovers domain structure. Our results reveal that deep RL provides a promising framework for solving previously unsolved AI planning problems, provided a proper training curriculum can be devised.

References

  1. David Abel, David Ellis Hershkowitz, Gabriel Barth-Maron, Stephen Brawner, Kevin O'Farrell, James MacGlashan, and Stefanie Tellex. Goal-based action priors. In Twenty-Fifth International Conference on Automated Planning and Scheduling, 2015.Google ScholarGoogle Scholar
  2. Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. Curriculum learning. In Proceedings of the 26th annual international conference on machine learning, pages 41-48, 2009.Google ScholarGoogle Scholar
  3. Avrim L Blum and Merrick L Furst. Fast planning through planning graph analysis. Artificial intelligence, 90(1-2):281-300, 1997.Google ScholarGoogle Scholar
  4. Tom Bylander. The computational complexity of propositional strips planning. Artificial Intelligence, 69(1-2):165-204, 1994.Google ScholarGoogle Scholar
  5. Joseph Culberson. Sokoban is pspacecomplete. University of Alberta, Technical Report, TRID-ID TR97-02, 1997.Google ScholarGoogle Scholar
  6. Alan Fern, Roni Khardon, and Prasad Tadepalli. The first learning track of the international planning competition. Machine Learning, 84(1-2):81-107, 2011.Google ScholarGoogle Scholar
  7. Maria Fox and Derek Long. Pddl2. 1: An extension to pddl for expressing temporal planning domains. Journal of artificial intelligence research, 20:61- 124, 2003.Google ScholarGoogle Scholar
  8. Kurt Gödel. Über formal unentscheidbare sätze der principia mathematica und verwandter systeme i. Monatshefte für mathematik und physik, 38(1):173-198, 1931.Google ScholarGoogle Scholar
  9. Edward Groshev, Aviv Tamar, Maxwell Goldstein, Siddharth Srivastava, and Pieter Abbeel. Learning generalized reactive policies using deep neural networks. In 2018 AAAI Spring Symposium Series, 2018.Google ScholarGoogle Scholar
  10. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770-778, 2016.Google ScholarGoogle Scholar
  11. Robert A Hearn and Erik D Demaine. Pspace-completeness of sliding-block puzzles and other problems through the nondeterministic constraint logic model of computation. Theoretical Computer Science, 343(1-2):72-96, 2005.Google ScholarGoogle Scholar
  12. Jörg Hoffmann. Ff: The fast-forward planning system. AI magazine, 22(3):57-57, 2001.Google ScholarGoogle Scholar
  13. Matti Järvisalo, Daniel Le Berre, Olivier Roussel, and Laurent Simon. The international sat solver competitions. Ai Magazine, 33(1):89-92, 2012.Google ScholarGoogle Scholar
  14. Andreas Junghanns and Jonathan Schaeffer. Sokoban: Improving the search with relevance cuts. Theoretical Computer Science, 252(1- 2):151-175, 2001.Google ScholarGoogle Scholar
  15. Henry Kautz and Bart Selman. Blackbox: A new approach to the application of theorem proving to problem solving. In AIPS98 Workshop on Planning as Combinatorial Search, volume 58260, pages 58- 60, 1998.Google ScholarGoogle Scholar
  16. Jana Koehler, Bernhard Nebel, Jörg Hoffmann, and Yannis Dimopoulos. Extending planning graphs to an adl subset. In European Conference on Planning, pages 273-285. Springer, 1997.Google ScholarGoogle Scholar
  17. Nir Lipovetzky. Structure and inference in classical planning. PhD thesis, Universitat Pompeu Fabra, 2013.Google ScholarGoogle Scholar
  18. Joao P Marques-Silva and Karem A Sakallah. Grasp: A search algorithm for propositional satisfiability. IEEE Transactions on Computers, 48(5):506-521, 1999.Google ScholarGoogle Scholar
  19. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.Google ScholarGoogle Scholar
  20. Benjamin Rosman and Subramanian Ramamoorthy. What good are actionš accelerating learning using learned action priors. In 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL), pages 1-6. IEEE, 2012.Google ScholarGoogle Scholar
  21. Bart Selman, Hector J Levesque, David G Mitchell, et al. A new method for solving hard satisfiability problems. In Aaai, volume 92, pages 440- 446. Citeseer, 1992.Google ScholarGoogle Scholar
  22. David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484, 2016.Google ScholarGoogle Scholar
  23. David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, et al. Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815, 2017.Google ScholarGoogle Scholar
  24. Mauro Vallati, Lukas Chrpa, Marek Grzes, Thomas Leo McCluskey, Mark Roberts, Scott Sanner, et al. The 2014 international planning competition: Progress and trends. Ai Magazine, 36(3):90-98, 2015.Google ScholarGoogle Scholar
  25. Théophane Weber, Sébastien Racanière, David P Reichert, Lars Buesing, Arthur Guez, Danilo Jimenez Rezende, Adria Puigdomenech Badia, Oriol Vinyals, Nicolas Heess, Yujia Li, et al. Imagination-augmented agents for deep reinforcement learning. arXiv preprint arXiv:1707.06203, 2017.Google ScholarGoogle Scholar
  26. James Welle. A comparison of parallelism in current ai planning systems. 2003.Google ScholarGoogle Scholar

Index Terms

(auto-classified)
  1. Solving hard AI planning instances using curriculum-driven deep reinforcement learning

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image Guide Proceedings
              IJCAI'20: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence
              January 2021
              5311 pages
              ISBN:9780999241165

              Copyright © 2020 International Joint Conferences on Artificial Intelligence

              Publisher

              Unknown publishers

              Publication History

              • Published: 7 January 2021

              Qualifiers

              • research-article
              • Research
              • Refereed limited
            • Article Metrics

              • Downloads (Last 12 months)6
              • Downloads (Last 6 weeks)0

              Other Metrics

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!