10.5555/3495724.3495988guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article
Free Access

A novel automated curriculum strategy to solve hard Sokoban planning instances

Published:06 December 2020Publication History

ABSTRACT

In recent years, we have witnessed tremendous progress in deep reinforcement learning (RL) for tasks such as Go, Chess, video games, and robot control. Nevertheless, other combinatorial domains, such as AI planning, still pose considerable challenges for RL approaches. The key difficulty in those domains is that a positive reward signal becomes exponentially rare as the minimal solution length increases. So, an RL approach loses its training signal. There has been promising recent progress by using a curriculum-driven learning approach that is designed to solve a single hard instance. We present a novel automated curriculum approach that dynamically selects from a pool of unlabeled training instances of varying task complexity guided by our difficulty quantum momentum strategy. We show how the smoothness of the task hardness impacts the final learning results. In particular, as the size of the instance pool increases, the "hardness gap" decreases, which facilitates a smoother automated curriculum based learning process. Our automated curriculum approach dramatically improves upon the previous approaches. We show our results on Sokoban, which is a traditional PSPACE-complete planning problem and presents a great challenge even for specialized solvers. Our RL agent can solve hard instances that are far out of reach for any previous state-of-the-art Sokoban solver. In particular, our approach can uncover plans that require hundreds of steps, while the best previous search methods would take many years of computing time to solve such instances. In addition, we show that we can further boost the RL performance with an intricate coupling of our automated curriculum approach with a curiosity-driven search strategy and a graph neural net representation.

Skip Supplemental Material Section

Supplemental Material

References

  1. Peter Auer, Nicolo Cesa-Bianchi, Yoav Freund, and Robert E Schapire. The nonstochastic multiarmed bandit problem. SIAM journal on computing, 32(1):48–77, 2002.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. Curriculum learning. In Proceedings of the 26th annual international conference on machine learning, pages 41–48, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Yuri Burda, Harrison Edwards, Amos Storkey, and Oleg Klimov. Exploration by random network distillation. arXiv preprint arXiv:1810.12894, 2018.Google ScholarGoogle Scholar
  4. Tom Bylander. Complexity results for planning. In Proc. IJCAI, 1991.Google ScholarGoogle Scholar
  5. Hongyun Cai, Vincent W Zheng, and Kevin Chen-Chuan Chang. A comprehensive survey of graph embedding: Problems, techniques, and applications. IEEE Transactions on Knowledge and Data Engineering, 30(9):1616–1637, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. David Chapman. Planning for conjunctive goals. Artificial Intelligence, 1987.Google ScholarGoogle Scholar
  7. Joseph Culberson. Sokoban is pspace-complete. University of Alberta, Technical Report, TRID-ID TR97-02, 1997.Google ScholarGoogle Scholar
  8. Florent Diedler. Sokolution solver. http://codeanalysis.fr/sokoba, 2020.Google ScholarGoogle Scholar
  9. Adrien Ecoffet, Joost Huizinga, Joel Lehman, Kenneth O Stanley, and Jeff Clune. Go-explore: a new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995, 2019.Google ScholarGoogle Scholar
  10. Jeffrey L Elman. Learning and development in neural networks: The importance of starting small. Cognition, 48(1):71–99, 1993.Google ScholarGoogle ScholarCross RefCross Ref
  11. Dieqiao Feng, Carla Gomes, and Bart Selman. Solving hard AI planning instances using curriculum-driven deep reinforcement learning. In Proc. 29th International Joint Conference on Artificial Intelligence, IJCAI. (arxiv preprint arXiv:2006.02689), 2020.Google ScholarGoogle ScholarCross RefCross Ref
  12. Alex Graves, Marc G Bellemare, Jacob Menick, Remi Munos, and Koray Kavukcuoglu. Automated curriculum learning for neural networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1311–1320. JMLR. org, 2017.Google ScholarGoogle Scholar
  13. Edward Groshev, Aviv Tamar, Maxwell Goldstein, Siddharth Srivastava, and Pieter Abbeel. Learning generalized reactive policies using deep neural networks. In 2018 AAAI Spring Symposium Series, 2018.Google ScholarGoogle Scholar
  14. William L Hamilton, Rex Ying, and Jure Leskovec. Representation learning on graphs: Methods and applications. arXiv preprint arXiv:1709.05584, 2017.Google ScholarGoogle Scholar
  15. Jörg Hoffmann. Ff: The fast-forward planning system. AI magazine, 22(3):57–57, 2001.Google ScholarGoogle Scholar
  16. Nir Lipovetzky. Structure and inference in classical planning. PhD thesis, Universitat Pompeu Fabra, 2013.Google ScholarGoogle Scholar
  17. Andrew Y Ng, Daishi Harada, and Stuart Russell. Policy invariance under reward transformations: Theory and application to reward shaping. In ICML, volume 99, pages 278–287, 1999.Google ScholarGoogle Scholar
  18. Deepak Pathak, Pulkit Agrawal, Alexei A Efros, and Trevor Darrell. Curiosity-driven exploration by self-supervised prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 16–17, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  19. Planning. Planning competion. http://www.icaps-conference.org/index.php/Main/Competitions, 2020.Google ScholarGoogle Scholar
  20. Stuart Russell. Human compatible: Artificial intelligence and the problem of control. Penguin, 2019.Google ScholarGoogle Scholar
  21. Jendrik S-eipp and Gabriele Röger. Fast downward stone soup 2018. IPC2018–Classical Tracks, pages 72–74, 2018.Google ScholarGoogle Scholar
  22. David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, et al. Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815, 2017.Google ScholarGoogle Scholar
  23. Sokoban. Sokoban repository. http://sokobano.de/wiki/index.php?title=Solver_ Statistics, 2020.Google ScholarGoogle Scholar
  24. Théophane Weber, Sébastien Racanière, David P Reichert, Lars Buesing, Arthur Guez, Danilo Jimenez Rezende, Adria Puigdomenech Badia, Oriol Vinyals, Nicolas Heess, Yujia Li, et al. Imagination-augmented agents for deep reinforcement learning. arXiv preprint arXiv:1707.06203, 2017.Google ScholarGoogle Scholar
  25. Daokun Zhang, Jie Yin, Xingquan Zhu, and Chengqi Zhang. Network representation learning: A survey. IEEE transactions on Big Data, 2018.Google ScholarGoogle Scholar
  26. Jie Zhou, Ganqu Cui, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, Lifeng Wang, Changcheng Li, and Maosong Sun. Graph neural networks: A review of methods and applications. arXiv preprint arXiv:1812.08434, 2018.Google ScholarGoogle Scholar

Index Terms

(auto-classified)
  1. A novel automated curriculum strategy to solve hard Sokoban planning instances

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image Guide Proceedings
      NIPS'20: Proceedings of the 34th International Conference on Neural Information Processing Systems
      December 2020
      22651 pages
      ISBN:9781713829546

      Copyright © 2020 Neural Information Processing Systems Foundation, Inc.

      Publisher

      Curran Associates Inc.

      Red Hook, NY, United States

      Publication History

      • Published: 6 December 2020

      Qualifiers

      • research-article
      • Research
      • Refereed limited
    • Article Metrics

      • Downloads (Last 12 months)12
      • Downloads (Last 6 weeks)3

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!