ABSTRACT
Despite significant progress in general AI planning, certain domains remain out of reach of current AI planning systems. Sokoban is a PSPACE-complete planning task and represents one of the hardest domains for current AI planners. Even domain-specific specialized search methods fail quickly due to the exponential search complexity on hard instances. Our approach based on deep reinforcement learning augmented with a curriculum-driven method is the first one to solve hard instances within one day of training while other modern solvers cannot solve these instances within any reasonable time limit. In contrast to prior efforts, which use carefully handcrafted pruning techniques, our approach automatically uncovers domain structure. Our results reveal that deep RL provides a promising framework for solving previously unsolved AI planning problems, provided a proper training curriculum can be devised.
- David Abel, David Ellis Hershkowitz, Gabriel Barth-Maron, Stephen Brawner, Kevin O'Farrell, James MacGlashan, and Stefanie Tellex. Goal-based action priors. In Twenty-Fifth International Conference on Automated Planning and Scheduling, 2015.Google Scholar
- Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. Curriculum learning. In Proceedings of the 26th annual international conference on machine learning, pages 41-48, 2009.Google Scholar
- Avrim L Blum and Merrick L Furst. Fast planning through planning graph analysis. Artificial intelligence, 90(1-2):281-300, 1997.Google Scholar
- Tom Bylander. The computational complexity of propositional strips planning. Artificial Intelligence, 69(1-2):165-204, 1994.Google Scholar
- Joseph Culberson. Sokoban is pspacecomplete. University of Alberta, Technical Report, TRID-ID TR97-02, 1997.Google Scholar
- Alan Fern, Roni Khardon, and Prasad Tadepalli. The first learning track of the international planning competition. Machine Learning, 84(1-2):81-107, 2011.Google Scholar
- Maria Fox and Derek Long. Pddl2. 1: An extension to pddl for expressing temporal planning domains. Journal of artificial intelligence research, 20:61- 124, 2003.Google Scholar
- Kurt Gödel. Über formal unentscheidbare sätze der principia mathematica und verwandter systeme i. Monatshefte für mathematik und physik, 38(1):173-198, 1931.Google Scholar
- Edward Groshev, Aviv Tamar, Maxwell Goldstein, Siddharth Srivastava, and Pieter Abbeel. Learning generalized reactive policies using deep neural networks. In 2018 AAAI Spring Symposium Series, 2018.Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770-778, 2016.Google Scholar
- Robert A Hearn and Erik D Demaine. Pspace-completeness of sliding-block puzzles and other problems through the nondeterministic constraint logic model of computation. Theoretical Computer Science, 343(1-2):72-96, 2005.Google Scholar
- Jörg Hoffmann. Ff: The fast-forward planning system. AI magazine, 22(3):57-57, 2001.Google Scholar
- Matti Järvisalo, Daniel Le Berre, Olivier Roussel, and Laurent Simon. The international sat solver competitions. Ai Magazine, 33(1):89-92, 2012.Google Scholar
- Andreas Junghanns and Jonathan Schaeffer. Sokoban: Improving the search with relevance cuts. Theoretical Computer Science, 252(1- 2):151-175, 2001.Google Scholar
- Henry Kautz and Bart Selman. Blackbox: A new approach to the application of theorem proving to problem solving. In AIPS98 Workshop on Planning as Combinatorial Search, volume 58260, pages 58- 60, 1998.Google Scholar
- Jana Koehler, Bernhard Nebel, Jörg Hoffmann, and Yannis Dimopoulos. Extending planning graphs to an adl subset. In European Conference on Planning, pages 273-285. Springer, 1997.Google Scholar
- Nir Lipovetzky. Structure and inference in classical planning. PhD thesis, Universitat Pompeu Fabra, 2013.Google Scholar
- Joao P Marques-Silva and Karem A Sakallah. Grasp: A search algorithm for propositional satisfiability. IEEE Transactions on Computers, 48(5):506-521, 1999.Google Scholar
- Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.Google Scholar
- Benjamin Rosman and Subramanian Ramamoorthy. What good are actionš accelerating learning using learned action priors. In 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL), pages 1-6. IEEE, 2012.Google Scholar
- Bart Selman, Hector J Levesque, David G Mitchell, et al. A new method for solving hard satisfiability problems. In Aaai, volume 92, pages 440- 446. Citeseer, 1992.Google Scholar
- David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484, 2016.Google Scholar
- David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, et al. Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815, 2017.Google Scholar
- Mauro Vallati, Lukas Chrpa, Marek Grzes, Thomas Leo McCluskey, Mark Roberts, Scott Sanner, et al. The 2014 international planning competition: Progress and trends. Ai Magazine, 36(3):90-98, 2015.Google Scholar
- Théophane Weber, Sébastien Racanière, David P Reichert, Lars Buesing, Arthur Guez, Danilo Jimenez Rezende, Adria Puigdomenech Badia, Oriol Vinyals, Nicolas Heess, Yujia Li, et al. Imagination-augmented agents for deep reinforcement learning. arXiv preprint arXiv:1707.06203, 2017.Google Scholar
- James Welle. A comparison of parallelism in current ai planning systems. 2003.Google Scholar
Index Terms
(auto-classified)Solving hard AI planning instances using curriculum-driven deep reinforcement learning
Comments