ABSTRACT
In recent years, we have witnessed tremendous progress in deep reinforcement learning (RL) for tasks such as Go, Chess, video games, and robot control. Nevertheless, other combinatorial domains, such as AI planning, still pose considerable challenges for RL approaches. The key difficulty in those domains is that a positive reward signal becomes exponentially rare as the minimal solution length increases. So, an RL approach loses its training signal. There has been promising recent progress by using a curriculum-driven learning approach that is designed to solve a single hard instance. We present a novel automated curriculum approach that dynamically selects from a pool of unlabeled training instances of varying task complexity guided by our difficulty quantum momentum strategy. We show how the smoothness of the task hardness impacts the final learning results. In particular, as the size of the instance pool increases, the "hardness gap" decreases, which facilitates a smoother automated curriculum based learning process. Our automated curriculum approach dramatically improves upon the previous approaches. We show our results on Sokoban, which is a traditional PSPACE-complete planning problem and presents a great challenge even for specialized solvers. Our RL agent can solve hard instances that are far out of reach for any previous state-of-the-art Sokoban solver. In particular, our approach can uncover plans that require hundreds of steps, while the best previous search methods would take many years of computing time to solve such instances. In addition, we show that we can further boost the RL performance with an intricate coupling of our automated curriculum approach with a curiosity-driven search strategy and a graph neural net representation.
Supplemental Material
Available for Download
Supplemental material.
- Peter Auer, Nicolo Cesa-Bianchi, Yoav Freund, and Robert E Schapire. The nonstochastic multiarmed bandit problem. SIAM journal on computing, 32(1):48–77, 2002.Google ScholarDigital Library
- Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. Curriculum learning. In Proceedings of the 26th annual international conference on machine learning, pages 41–48, 2009.Google ScholarDigital Library
- Yuri Burda, Harrison Edwards, Amos Storkey, and Oleg Klimov. Exploration by random network distillation. arXiv preprint arXiv:1810.12894, 2018.Google Scholar
- Tom Bylander. Complexity results for planning. In Proc. IJCAI, 1991.Google Scholar
- Hongyun Cai, Vincent W Zheng, and Kevin Chen-Chuan Chang. A comprehensive survey of graph embedding: Problems, techniques, and applications. IEEE Transactions on Knowledge and Data Engineering, 30(9):1616–1637, 2018.Google ScholarDigital Library
- David Chapman. Planning for conjunctive goals. Artificial Intelligence, 1987.Google Scholar
- Joseph Culberson. Sokoban is pspace-complete. University of Alberta, Technical Report, TRID-ID TR97-02, 1997.Google Scholar
- Florent Diedler. Sokolution solver. http://codeanalysis.fr/sokoba, 2020.Google Scholar
- Adrien Ecoffet, Joost Huizinga, Joel Lehman, Kenneth O Stanley, and Jeff Clune. Go-explore: a new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995, 2019.Google Scholar
- Jeffrey L Elman. Learning and development in neural networks: The importance of starting small. Cognition, 48(1):71–99, 1993.Google ScholarCross Ref
- Dieqiao Feng, Carla Gomes, and Bart Selman. Solving hard AI planning instances using curriculum-driven deep reinforcement learning. In Proc. 29th International Joint Conference on Artificial Intelligence, IJCAI. (arxiv preprint arXiv:2006.02689), 2020.Google ScholarCross Ref
- Alex Graves, Marc G Bellemare, Jacob Menick, Remi Munos, and Koray Kavukcuoglu. Automated curriculum learning for neural networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1311–1320. JMLR. org, 2017.Google Scholar
- Edward Groshev, Aviv Tamar, Maxwell Goldstein, Siddharth Srivastava, and Pieter Abbeel. Learning generalized reactive policies using deep neural networks. In 2018 AAAI Spring Symposium Series, 2018.Google Scholar
- William L Hamilton, Rex Ying, and Jure Leskovec. Representation learning on graphs: Methods and applications. arXiv preprint arXiv:1709.05584, 2017.Google Scholar
- Jörg Hoffmann. Ff: The fast-forward planning system. AI magazine, 22(3):57–57, 2001.Google Scholar
- Nir Lipovetzky. Structure and inference in classical planning. PhD thesis, Universitat Pompeu Fabra, 2013.Google Scholar
- Andrew Y Ng, Daishi Harada, and Stuart Russell. Policy invariance under reward transformations: Theory and application to reward shaping. In ICML, volume 99, pages 278–287, 1999.Google Scholar
- Deepak Pathak, Pulkit Agrawal, Alexei A Efros, and Trevor Darrell. Curiosity-driven exploration by self-supervised prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 16–17, 2017.Google ScholarCross Ref
- Planning. Planning competion. http://www.icaps-conference.org/index.php/Main/Competitions, 2020.Google Scholar
- Stuart Russell. Human compatible: Artificial intelligence and the problem of control. Penguin, 2019.Google Scholar
- Jendrik S-eipp and Gabriele Röger. Fast downward stone soup 2018. IPC2018–Classical Tracks, pages 72–74, 2018.Google Scholar
- David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, et al. Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815, 2017.Google Scholar
- Sokoban. Sokoban repository. http://sokobano.de/wiki/index.php?title=Solver_ Statistics, 2020.Google Scholar
- Théophane Weber, Sébastien Racanière, David P Reichert, Lars Buesing, Arthur Guez, Danilo Jimenez Rezende, Adria Puigdomenech Badia, Oriol Vinyals, Nicolas Heess, Yujia Li, et al. Imagination-augmented agents for deep reinforcement learning. arXiv preprint arXiv:1707.06203, 2017.Google Scholar
- Daokun Zhang, Jie Yin, Xingquan Zhu, and Chengqi Zhang. Network representation learning: A survey. IEEE transactions on Big Data, 2018.Google Scholar
- Jie Zhou, Ganqu Cui, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, Lifeng Wang, Changcheng Li, and Maosong Sun. Graph neural networks: A review of methods and applications. arXiv preprint arXiv:1812.08434, 2018.Google Scholar
Index Terms
(auto-classified)A novel automated curriculum strategy to solve hard Sokoban planning instances
Comments