Abstract
In the field of Sequential Decision Making (SDM), two paradigms have historically vied for supremacy: Automated Planning (AP) and Reinforcement Learning (RL). In the spirit of reconciliation, this paper reviews AP, RL and hybrid methods (e.g., novel learn to plan techniques) for solving Sequential Decision Processes (SDPs), focusing on their knowledge representation: symbolic, subsymbolic or a combination. Additionally, it also covers methods for learning the SDP structure. Finally, we compare the advantages and drawbacks of the existing methods and conclude that neurosymbolic AI poses a promising approach for SDM, since it combines AP and RL with a hybrid knowledge representation.
- Pieter Abbeel and Andrew Ng. 2004. Learning first-order Markov models for control. NeurIPS 17(2004).Google Scholar
- Kamal Acharya, Waleed Raza, Carlos Dourado, Alvaro Velasquez, and Houbing Herbert Song. 2023. Neurosymbolic reinforcement learning and planning: A survey. IEEE Trans. Artif. Intell.(2023).Google Scholar
- David W Aha. 2018. Goal reasoning: Foundations, emerging applications, and prospects. AI Mag. 39, 2 (2018), 3–24.Google ScholarDigital Library
- Ankuj Arora, Humbert Fiorino, Damien Pellier, Marc Métivier, and Sylvie Pesty. 2018. A review of learning planning action models. Knowl. Eng. Rev. 33(2018), e20.Google ScholarCross Ref
- Saurabh Arora and Prashant Doshi. 2021. A survey of inverse reinforcement learning: Challenges, methods and progress. Artif. Intell. 297(2021), 103500.Google ScholarCross Ref
- Kavosh Asadi, Evan Cater, Dipendra Misra, and Michael L Littman. 2018. Towards a simple approach to multi-step model-based reinforcement learning. arXiv (2018).Google Scholar
- Masataro Asai, Hiroshi Kajino, Alex Fukunaga, and Christian Muise. 2022. Classical planning in deep latent space. JAIR 74(2022), 1599–1686.Google ScholarDigital Library
- Marcello Balduccini. 2011. Learning and using domain-specific heuristics in ASP solvers. AI Commun. 24, 2 (2011), 147–164.Google ScholarDigital Library
- Peter Battaglia, Razvan Pascanu, Matthew Lai, Danilo Jimenez Rezende, et al. 2016. Interaction networks for learning about objects, relations and physics. NeurIPS 29(2016).Google Scholar
- Peter W Battaglia, Jessica B Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, et al. 2018. Relational inductive biases, deep learning, and graph networks. arXiv (2018).Google Scholar
- Dimitri Bertsekas. 2019. Reinforcement learning and optimal control. Athena Scientific.Google Scholar
- Avrim L Blum and Merrick L Furst. 1997. Fast planning through planning graph analysis. Artif. Intell. 90(1997), 281–300.Google ScholarDigital Library
- Blai Bonet and Hector Geffner. 2003. Labeled RTDP: Improving the Convergence of Real-Time Dynamic Programming.. In ICAPS, Vol. 3. 12–21.Google Scholar
- Adi Botea, Markus Enzenberger, Martin Müller, and Jonathan Schaeffer. 2005. Macro-FF: Improving AI planning with automatically learned macro-operators. JAIR 24(2005), 581–621.Google ScholarDigital Library
- Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. NeurIPS 33(2020), 1877–1901.Google Scholar
- Cameron B Browne, Edward Powley, Daniel Whitehouse, Simon M Lucas, Peter I Cowling, Philipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samothrakis, and Simon Colton. 2012. A survey of monte carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 4, 1 (2012), 1–43.Google ScholarCross Ref
- Alan Bundy and Lincoln Wallen. 1984. Breadth-first search. Catalogue of artificial intelligence tools(1984), 13–13.Google Scholar
- Quentin Cappart, Didier Chételat, Elias B. Khalil, Andrea Lodi, Christopher Morris, and Petar Velickovic. 2021. Combinatorial Optimization and Reasoning with Graph Neural Networks. In IJCAI. 4348–4355.Google Scholar
- Luis A Castillo, Juan Fernández-Olivares, Oscar Garcia-Perez, and Francisco Palao. 2006. Efficiently Handling Temporal Knowledge in an HTN Planner.. In ICAPS. 63–72.Google Scholar
- Tristan Cazenave. 2006. Optimizations of data structures, heuristics and algorithms for path-finding on maps. In 2006 IEEE Symp. Comp. Intell. Games. 27–33.Google ScholarCross Ref
- Tathagata Chakraborti, Sarath Sreedharan, and Subbarao Kambhampati. 2020. The Emerging Landscape of Explainable Automated Planning & Decision Making. In IJCAI. 4803–4811.Google Scholar
- Michael Chang, Tomer D. Ullman, Antonio Torralba, and Joshua B. Tenenbaum. 2017. A Compositional Object-Based Approach to Learning Physical Dynamics. In ICLR. OpenReview.net.Google Scholar
- Arthur Charpentier, Romuald Elie, and Carl Remlinger. 2021. Reinforcement learning in economics and finance. Comput. Econ. (2021), 1–38.Google Scholar
- Kevin Chen, Nithin Shrivatsav Srikanth, David Kent, Harish Ravichandar, and Sonia Chernova. 2021. Learning hierarchical task networks with preferences from unannotated demonstrations. In CoRL. 1572–1581.Google Scholar
- Silvia Chiappa, Sébastien Racanière, Daan Wierstra, and Shakir Mohamed. 2017. Recurrent environment simulators. In ICLR. OpenReview.net.Google Scholar
- Jaedeug Choi and Kee-Eung Kim. 2011. Map inference for bayesian inverse reinforcement learning. NeurIPS 24(2011).Google Scholar
- Lonnie Chrisman. 1992. Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. In AAAI, Vol. 1992. 183–188.Google Scholar
- Andrew I Coles and Amanda J Smith. 2007. Marvin: A heuristic search planner with online macro-action learning. JAIR 28(2007), 119–156.Google ScholarDigital Library
- Marc Deisenroth and Carl E Rasmussen. 2011. PILCO: A model-based and data-efficient approach to policy search. In ICML. 465–472.Google Scholar
- S Depeweg, JM Hernández-Lobato, F Doshi-Velez, and S Udluft. 2017. Learning and Policy Search in Stochastic Dynamical Systems with Bayesian Neural Networks. In ICLR. OpenReview.net.Google Scholar
- Carlos Diuk, Andre Cohen, and Michael L Littman. 2008. An object-oriented representation for efficient reinforcement learning. In ICML. 240–247.Google Scholar
- Honghua Dong, Jiayuan Mao, Tian Lin, Chong Wang, Lihong Li, and Denny Zhou. 2019. Neural Logic Machines. In ICLR. OpenReview.net.Google Scholar
- Denise Draper, Steve Hanks, and Daniel S Weld. 1994. Probabilistic planning with information gathering and contingent execution. In AIPS. 31–36.Google Scholar
- Sašo Džeroski, Luc De Raedt, and Kurt Driessens. 2001. Relational reinforcement learning. Mach. Learn. 43(2001), 7–52.Google ScholarCross Ref
- Stefan Edelkamp. 2001. Planning with pattern databases. In Proc. ECP, Vol. 1. 13–24.Google Scholar
- Stefan Edelkamp. 2007. Automated creation of pattern database search heuristics. Lect. Notes Comput. Sci. 4428 (2007), 35.Google ScholarDigital Library
- Gregory Farquhar, Tim Rockt aschel, Maximilian Igl, and Shimon Whiteson. 2018. TreeQN and ATreeC: Differentiable Tree Planning for Deep Reinforcement Learning. In ICLR.Google Scholar
- Zhengzhu Feng and Eric A Hansen. 2002. Symbolic heuristic search for factored Markov decision processes. In AAAI/IAAI. 455–460.Google Scholar
- Alan Fern, Sung Wook Yoon, and Robert Givan. 2004. Learning Domain-Specific Control Knowledge from Random Walks. In ICAPS. 191–199.Google Scholar
- Seyedshams Feyzabadi and Stefano Carpin. 2017. Planning using hierarchical constrained Markov decision processes. Auton. Robots 41(2017), 1589–1607.Google ScholarDigital Library
- Richard E Fikes, Peter E Hart, and Nils J Nilsson. 1972. Learning and executing generalized robot plans. Artif. Intell. 3(1972), 251–288.Google ScholarDigital Library
- Sébastien Forestier, Rémy Portelas, Yoan Mollard, and Pierre-Yves Oudeyer. 2022. Intrinsically motivated goal exploration processes with automatic curriculum learning. JMLR 23, 1 (2022), 6818–6858.Google ScholarDigital Library
- Maria Fox and Derek Long. 1998. The automatic inference of state invariants in TIM. JAIR 9(1998), 367–421.Google ScholarDigital Library
- Raquel Fuentetaja and Tomás De la Rosa. 2012. A planning-based approach for generating planning problems. In Workshops at AAAI.Google Scholar
- Artur d’Avila Garcez, Sebastian Bader, Howard Bowman, Luis C Lamb, Leo de Penning, BV Illuminoo, Hoifung Poon, and COPPE Gerson Zaverucha. 2022. Neural-symbolic learning and reasoning: a survey and interpretation. Neuro-Symbolic Artificial Intelligence: The State of the Art 342 (2022).Google Scholar
- Javier Garcıa and Fernando Fernández. 2015. A comprehensive survey on safe reinforcement learning. JMLR 16(2015), 1437–1480.Google ScholarDigital Library
- Marta Garnelo, Kai Arulkumaran, and Murray Shanahan. 2016. Towards deep symbolic reinforcement learning. arXiv (2016).Google Scholar
- Marta Garnelo and Murray Shanahan. 2019. Reconciling deep learning with symbolic artificial intelligence: representing objects and relations. Curr. Opin. Behav. Sci. 29 (2019), 17–23.Google ScholarCross Ref
- Clement Gehring, Masataro Asai, Rohan Chitnis, Tom Silver, Leslie Kaelbling, Shirin Sohrabi, and Michael Katz. 2022. Reinforcement learning for classical planning: Viewing heuristics as dense reward generators. In ICAPS, Vol. 32. 588–596.Google Scholar
- Ilche Georgievski and Marco Aiello. 2015. HTN planning: Overview, comparison, and beyond. Artif. Intell. 222(2015), 124–156.Google ScholarDigital Library
- Alfonso Gerevini and Lenhart Schubert. 1998. Inferring state constraints for domain-independent planning. In AAAI. 905–912.Google Scholar
- Malik Ghallab, Dana Nau, and Paolo Traverso. 2016. Automated planning and acting. Cambridge University Press.Google Scholar
- Christopher Grimm, André Barreto, Satinder Singh, and David Silver. 2020. The value equivalence principle for model-based reinforcement learning. NeurIPS 33(2020), 5541–5552.Google Scholar
- Shixiang Gu, Timothy Lillicrap, Ilya Sutskever, and Sergey Levine. 2016. Continuous deep q-learning with model-based acceleration. In ICML. 2829–2838.Google Scholar
- Arthur Guez, Mehdi Mirza, Karol Gregor, Rishabh Kabra, Sébastien Racanière, Théophane Weber, David Raposo, Adam Santoro, Laurent Orseau, Tom Eccles, et al. 2019. An investigation of model-free planning. In ICML. 2464–2473.Google Scholar
- Arthur Guez, Théophane Weber, Ioannis Antonoglou, Karen Simonyan, Oriol Vinyals, Daan Wierstra, Rémi Munos, and David Silver. 2018. Learning to search with MCTSnets. In ICML. 1822–1831.Google Scholar
- David Gunning and David Aha. 2019. DARPA’s explainable artificial intelligence (XAI) program. AI Mag. 40(2019), 44–58.Google ScholarDigital Library
- Alexander Hans, Daniel Schneegaß, Anton Maximilian Schäfer, and Steffen Udluft. 2008. Safe exploration for reinforcement learning.. In ESANN. 143–148.Google Scholar
- Eric A Hansen and Shlomo Zilberstein. 2001. LAO*: A heuristic search algorithm that finds solutions with loops. Artif. Intell. 129, 1-2 (2001), 35–62.Google ScholarDigital Library
- Peter E Hart, Nils J Nilsson, and Bertram Raphael. 1968. A formal basis for the heuristic determination of minimum cost paths. IEEE Trans. Syst. Sci. Cybern. 4, 2 (1968), 100–107.Google ScholarCross Ref
- Patrik Haslum, Nir Lipovetzky, Daniele Magazzeni, and Christian Muise. 2019. An introduction to the planning domain definition language. Synth. Lect. Artif. Intell. Mach. Learn. 13 (2019), 1–187.Google ScholarCross Ref
- Matthew Hausknecht and Peter Stone. 2015. Deep recurrent q-learning for partially observable mdps. In AAAI.Google Scholar
- Matthias Heger. 1994. Consideration of risk in reinforcement learning. In Mach. Learn.Elsevier, 105–111.Google Scholar
- Malte Helmert. 2006. The fast downward planning system. JAIR 26(2006), 191–246.Google ScholarCross Ref
- Malte Helmert. 2009. Concise finite-domain representations for PDDL planning tasks. Artif. Intell. 173(2009), 503–535.Google ScholarDigital Library
- Todd Hester and Peter Stone. 2013. Texplore: real-time sample-efficient reinforcement learning for robots. Mach. Learn. 90(2013), 385–429.Google ScholarDigital Library
- Thomas Hickling, Abdelhafid Zenati, Nabil Aouf, and Phillippa Spencer. 2023. Explainability in Deep Reinforcement Learning: A Review into Current Methods and Applications. ACM Comput. Surv. 56, 5 (2023), 1–35.Google ScholarDigital Library
- Jesse Hoey, Robert St-Aubin, Alan Hu, and Craig Boutilier. 1999. SPUDD: stochastic planning using decision diagrams. In UAI. 279–288.Google Scholar
- Jörg Hoffmann. 2001. FF: The fast-forward planning system. AI Mag. 22, 3 (2001), 57–57.Google ScholarDigital Library
- Jörg Hoffmann, Julie Porteous, and Laura Sebastia. 2004. Ordered landmarks in planning. JAIR 22(2004), 215–278.Google ScholarDigital Library
- Chad Hogg, Héctor Munoz-Avila, and Ugur Kuter. 2008. HTN-MAKER: Learning HTNs with Minimal Additional Knowledge Engineering Required.. In AAAI. 950–956.Google Scholar
- Daniel Höller and Pascal Bercher. 2021. Landmark generation in HTN planning. In AAAI, Vol. 35. 11826–11834.Google Scholar
- Richard Howey, Derek Long, and Maria Fox. 2004. VAL: Automatic plan validation, continuous effects and mixed initiative planning using PDDL. In IEEE Int. Conf. Tools Artif. Intell.294–301.Google ScholarDigital Library
- Sergio Jiménez, Tomás De La Rosa, Susana Fernández, Fernando Fernández, and Daniel Borrajo. 2012. A review of machine learning for automated planning. Knowl. Eng. Rev. 27, 4 (2012), 433–467.Google ScholarDigital Library
- S Jiménez, F Fernández, and D Borrajo. 2008. The PELA architecture: integrating planning and learning to improve execution. In AAAI. AAAI Press.Google Scholar
- Sergio Jiménez, Javier Segovia-Aguas, and Anders Jonsson. 2019. A review of generalized planning. Knowl. Eng. Rev. 34(2019), e5.Google ScholarCross Ref
- Mu Jin, Zhihao Ma, Kebing Jin, Hankz Hankui Zhuo, Chen Chen, and Chao Yu. 2022. Creativity of ai: Automatic symbolic option discovery for facilitating deep reinforcement learning. In AAAI, Vol. 36. 7042–7050.Google Scholar
- Łukasz Kaiser, Mohammad Babaeizadeh, Piotr Miłos, Błażej Osiński, Roy H Campbell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski, Sergey Levine, et al. 2020. Model Based Reinforcement Learning for Atari. In ICLR. OpenReview.net.Google Scholar
- Ken Kansky, Tom Silver, David A Mély, Mohamed Eldawy, Miguel Lázaro-Gredilla, Xinghua Lou, Nimrod Dorfman, Szymon Sidor, Scott Phoenix, and Dileep George. 2017. Schema networks: Zero-shot transfer with a generative causal model of intuitive physics. In ICML. 1809–1818.Google Scholar
- Erez Karpas and Carmel Domshlak. 2009. Cost-Optimal Planning with Landmarks.. In IJCAI. 1728–1733.Google Scholar
- Michael Katz and Shirin Sohrabi. 2020. Generating Data In Planning: SAS Planning Tasks of a Given Causal Structure. HSDIP (2020), 41.Google Scholar
- Michael Katz, Kavitha Srinivas, Shirin Sohrabi, Mark Feblowitz, Octavian Udrea, and Oktie Hassanzadeh. 2021. Scenario planning in the wild: A neuro-symbolic approach. FinPlan 15(2021).Google Scholar
- Emil Keyder, Jörg Hoffmann, and Patrik Haslum. 2014. Improving delete relaxation heuristics through explicitly represented conjunctions. JAIR 50(2014), 487–533.Google ScholarDigital Library
- S Mohammad Khansari-Zadeh and Aude Billard. 2011. Learning stable nonlinear dynamical systems with gaussian mixture models. IEEE Trans. Robot. 27, 5 (2011), 943–957.Google ScholarDigital Library
- DP Kingma and M Welling. 2014. Auto-encoding variational Bayes international. In ICLR.Google Scholar
- Thomas N. Kipf, Elise van der Pol, and Max Welling. 2020. Contrastive Learning of Structured World Models. In ICLR. OpenReview.net.Google Scholar
- Jens Kober, J Andrew Bagnell, and Jan Peters. 2013. Reinforcement learning in robotics: A survey. Int. J. Rob. Res. 32, 11 (2013), 1238–1274.Google ScholarDigital Library
- Daphne Koller, Nir Friedman, Sašo Džeroski, Charles Sutton, Andrew McCallum, Avi Pfeffer, Pieter Abbeel, Ming-Fai Wong, Chris Meek, Jennifer Neville, et al. 2007. Introduction to statistical relational learning. MIT press.Google Scholar
- George Dimitri Konidaris and Andrew G Barto. 2007. Building Portable Options: Skill Transfer in Reinforcement Learning.. In IJCAI, Vol. 7. 895–900.Google Scholar
- Rogier Koppejan and Shimon Whiteson. 2011. Neuroevolutionary reinforcement learning for generalized control of simulated helicopters. Evol. Intell. 4(2011), 219–241.Google ScholarCross Ref
- Richard E Korf. 1985. Macro-operators: A weak method for learning. Artif. Intell. 26(1985), 35–77.Google ScholarDigital Library
- Stefan Kramer. 1996. Structural regression trees. In AAAI. 812–819.Google Scholar
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2017. Imagenet classification with deep convolutional neural networks. ACM Commun. 60(2017), 84–90.Google ScholarDigital Library
- Mikel Landajuela, Brenden K Petersen, Sookyung Kim, Claudio P Santiago, Ruben Glatt, Nathan Mundhenk, Jacob F Pettit, and Daniel Faissol. 2021. Discovering symbolic policies with deep reinforcement learning. In ICML. 5979–5989.Google Scholar
- Adrien Laversanne-Finot, Alexandre Pere, and Pierre-Yves Oudeyer. 2018. Curiosity driven exploration of learned disentangled goal spaces. In CoRL. 487–504.Google Scholar
- Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. nature 521(2015), 436–444.Google Scholar
- Yuxi Li. 2018. Deep Reinforcement Learning. arXiv (2018).Google Scholar
- Michael Lederman Littman. 1996. Algorithms for sequential decision-making. Brown University.Google Scholar
- Derek Long and Maria Fox. 1999. Efficient implementation of the plan graph in STAN. JAIR 10(1999), 87–115.Google ScholarDigital Library
- William S Lovejoy. 1991. A survey of algorithmic methods for partially observed Markov decision processes. Ann. Oper. Res. 28, 1 (1991), 47–65.Google ScholarDigital Library
- Daoming Lyu, Fangkai Yang, Bo Liu, and Steven Gustafson. 2019. SDRL: interpretable and data-efficient deep reinforcement learning leveraging symbolic planning. In AAAI, Vol. 33. 2970–2977.Google Scholar
- Marlos C Machado, Marc G Bellemare, and Michael Bowling. 2017. A laplacian framework for option discovery in reinforcement learning. In ICML. 2295–2304.Google Scholar
- Maurício Cecílio Magnaguagno, RAMON FRAGA PEREIRA, Martin Duarte Móre, and Felipe Rech Meneguzzi. 2017. Web planner: A tool to develop classical planning domains and visualize heuristic state-space search. In ICAPS UISP Workshop.Google Scholar
- Gary Marcus. 2018. Deep learning: A critical appraisal. arXiv (2018).Google Scholar
- Amy McGovern and Richard S Sutton. 1998. Macro-actions in reinforcement learning: An empirical analysis. Computer Science Department Faculty Publication Series (1998), 15.Google Scholar
- Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In ICML. 1928–1937.Google Scholar
- Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing atari with deep reinforcement learning. arXiv (2013).Google Scholar
- Thomas M Moerland, Joost Broekens, and Catholijn M Jonker. 2020. A framework for reinforcement learning and planning. arXiv (2020).Google Scholar
- Thomas M Moerland, Joost Broekens, Aske Plaat, Catholijn M Jonker, et al. 2023. Model-based reinforcement learning: A survey. Found. Trends Mach. Learn. 16, 1 (2023), 1–118.Google ScholarDigital Library
- Matthew Molineaux, Matthew Klenk, and David Aha. 2010. Goal-driven autonomy in a Navy strategy simulation. In AAAI, Vol. 24. 1548–1554.Google Scholar
- Kira Mourao, Ronald PA Petrick, and Mark Steedman. 2008. Using kernel perceptrons to learn action effects for planning. In CogSys. 45–50.Google Scholar
- Mausam Natarajan and Andrey Kolobov. 2022. Planning with Markov decision processes: An AI perspective. Springer Nature.Google Scholar
- Dana S Nau, Tsz-Chiu Au, Okhtay Ilghami, Ugur Kuter, J William Murdock, Dan Wu, and Fusun Yaman. 2003. SHOP2: An HTN planning system. JAIR 20(2003), 379–404.Google ScholarDigital Library
- Negin Nejati, Pat Langley, and Tolga Konik. 2006. Learning hierarchical task networks by observation. In ICML. 665–672.Google Scholar
- Andrew Y Ng, Stuart Russell, et al. 2000. Algorithms for inverse reinforcement learning.. In ICML, Vol. 1. 2.Google Scholar
- Carlos Núñez-Molina, Juan Fernández-Olivares, and Raúl Pérez. 2022. Learning to select goals in Automated Planning with Deep-Q Learning. Expert Syst. Appl. 202(2022), 117265.Google ScholarDigital Library
- Carlos Núñez-Molina, Pablo Mesejo, and Juan Fernández-Olivares. 2023. NeSIG: A Neuro-Symbolic Method for Learning to Generate Planning Problems. arXiv (2023).Google Scholar
- Tim Oates and Paul R Cohen. 1996. Searching for planning operators with context-dependent and probabilistic effects. In AAAI. 863–868.Google Scholar
- Junhyuk Oh, Satinder Singh, and Honglak Lee. 2017. Value prediction network. Advances in neural information processing systems 30 (2017).Google Scholar
- Ioannis Partalas, Dimitris Vrakas, and Ioannis Vlahavas. 2008. Reinforcement learning and automated planning: A survey. In Artificial Intelligence for Advanced Problem Solving Techniques. IGI Global, 148–165.Google Scholar
- Razvan Pascanu, Yujia Li, Oriol Vinyals, Nicolas Heess, Lars Buesing, Sebastien Racanière, David Reichert, Théophane Weber, Daan Wierstra, and Peter Battaglia. 2017. Learning model-based planning from scratch. arXiv (2017).Google Scholar
- Hanna M Pasula, Luke S Zettlemoyer, and Leslie Pack Kaelbling. 2007. Learning symbolic models of stochastic domains. JAIR 29(2007), 309–352.Google ScholarDigital Library
- Ramon Fraga Pereira, Nir Oren, and Felipe Meneguzzi. 2020. Landmark-based approaches for goal recognition as planning. Artif. Intell. 279(2020), 103217.Google ScholarDigital Library
- Aske Plaat, Walter Kosters, and Mike Preuss. 2023. High-accuracy model-based reinforcement learning, a survey. Artif. Intell. Rev. (2023), 1–33.Google Scholar
- Alberto Pozanco, Susana Fernández, and Daniel Borrajo. 2018. Learning-driven goal generation. AI Commun. 31, 2 (2018), 137–150.Google ScholarDigital Library
- Miquel Ramırez and Hector Geffner. 2009. Plan recognition as planning. In IJCAI. 1778–1783.Google Scholar
- Jussi Rintanen. 2008. Regression for classical and nondeterministic planning. In ECAI. IOS Press, 568–572.Google Scholar
- Alexander Rovner, Silvan Sievers, and Malte Helmert. 2019. Counterexample-guided abstraction refinement for pattern selection in optimal classical planning. In ICAPS, Vol. 29. 362–367.Google Scholar
- Stuart J. Russell and Peter Norvig. 2020. Artificial Intelligence: A Modern Approach (4th Edition). Pearson.Google Scholar
- Earl D Sacerdoti. 1975. The nonlinear nature of plans. Technical Report. Stanford Research Inst. Menlo Park CA.Google Scholar
- Javad Safaei and Gholamreza Ghassem-Sani. 2007. Incremental learning of planning operators in stochastic domains. In SOFSEM. 644–655.Google Scholar
- Scott Sanner et al. 2010. Relational dynamic influence diagram language (rddl): Language description. Unpublished ms. Australian National University 32 (2010), 27.Google Scholar
- Björn Schäpers, Tim Niemueller, Gerhard Lakemeyer, Martin Gebser, and Torsten Schaub. 2018. ASP-based time-bounded planning for logistics robots. In ICAPS, Vol. 28. 509–517.Google Scholar
- Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, et al. 2020. Mastering atari, go, chess and shogi by planning with a learned model. Nature 588, 7839 (2020), 604–609.Google Scholar
- José Á Segura-Muros, Raúl Pérez, and Juan Fernández-Olivares. 2021. Discovering relational and numerical expressions from plan traces for learning action models. Appl. Intell. 51(2021), 7973–7989.Google ScholarDigital Library
- Ashish Kumar Shakya, Gopinatha Pillai, and Sohom Chakrabarty. 2023. Reinforcement Learning Algorithms: A brief survey. Expert Syst. Appl. (2023), 120495.Google Scholar
- William Shen, Felipe Trevizan, and Sylvie Thiébaux. 2020. Learning domain-independent planning heuristics with hypergraph networks. In ICAPS, Vol. 30. 574–584.Google Scholar
- W M Shen and H A Simon. 1989. Rule Creation and Rule Learning Through Environmental Exploration. In IJCAI. Morgan Kaufmann, 675–680.Google Scholar
- Xingjian Shi, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-Kin Wong, and Wang-chun Woo. 2015. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. NeurIPS 28(2015).Google Scholar
- David Silver, Hado Hasselt, Matteo Hessel, Tom Schaul, Arthur Guez, Tim Harley, Gabriel Dulac-Arnold, David Reichert, Neil Rabinowitz, Andre Barreto, et al. 2017. The predictron: End-to-end learning and planning. In ICML. 3191–3199.Google Scholar
- David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, et al. 2018. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 362, 6419 (2018), 1140–1144.Google Scholar
- Satinder Singh, Tommi Jaakkola, and Michael Jordan. 1994. Reinforcement learning with soft state aggregation. NeurIPS 7(1994).Google Scholar
- Shirin Sohrabi, Anton V Riabov, and Octavian Udrea. 2016. Plan Recognition as Planning Revisited.. In IJCAI. 3258–3264.Google Scholar
- Aravind Srinivas, Allan Jabri, Pieter Abbeel, Sergey Levine, and Chelsea Finn. 2018. Universal planning networks: Learning generalizable representations for visuomotor control. In ICML. 4732–4741.Google Scholar
- Martin Stolle and Doina Precup. 2002. Learning options in reinforcement learning. In SARA. 212–223.Google Scholar
- Richard S Sutton. 1991. Dyna, an integrated architecture for learning, planning, and reacting. ACM Sigart Bulletin 2, 4 (1991), 160–163.Google ScholarDigital Library
- Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.Google ScholarDigital Library
- Richard S Sutton, Doina Precup, and Satinder Singh. 1999. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artif. Intell. 112, 1-2 (1999), 181–211.Google ScholarDigital Library
- Richard S. Sutton, Csaba Szepesvári, Alborz Geramifard, and Michael H. Bowling. 2008. Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping. In UAI. AUAI Press, 528–536.Google Scholar
- Prasad Tadepalli, Robert Givan, and Kurt Driessens. 2004. Relational reinforcement learning: An overview. In ICML workshop on relational reinforcement learning. 1–9.Google Scholar
- Aviv Tamar, Yi Wu, Garrett Thomas, Sergey Levine, and Pieter Abbeel. 2016. Value iteration networks. NeurIPS 29(2016).Google Scholar
- Robert Tarjan. 1972. Depth-first search and linear graph algorithms. SIAM J. Comput. 1, 2 (1972), 146–160.Google ScholarDigital Library
- Austin Tate. 1977. Generating project networks. In IJCAI. 888–893.Google Scholar
- Andrea Lockerd Thomaz, Cynthia Breazeal, et al. 2006. Reinforcement learning with human teachers: Evidence of feedback and guidance with implications for learning performance. In AAAI, Vol. 6. 1000–1005.Google Scholar
- Alvaro Torralba, Jendrik Seipp, and Silvan Sievers. 2021. Automatic instance generation for classical planning. In ICAPS, Vol. 31. 376–384.Google Scholar
- Sam Toyer, Felipe Trevizan, Sylvie Thiébaux, and Lexing Xie. 2018. Action schema networks: Generalised policies with deep learning. In AAAI, Vol. 32.Google Scholar
- Felipe W Trevizan and Manuela M Veloso. 2014. Depth-based short-sighted stochastic shortest path problems. Artif. Intell. 216(2014), 179–205.Google ScholarDigital Library
- Jes ús Virseda, Daniel Borrajo, and Vidal Alcázar. 2013. Learning heuristic functions for cost-based planning. Planning and Learning 4(2013).Google Scholar
- Mauro Vallati, Lukas Chrpa, Marek Grześ, Thomas Leo McCluskey, Mark Roberts, Scott Sanner, et al. 2015. The 2014 international planning competition: Progress and trends. Ai Mag. 36, 3 (2015), 90–98.Google ScholarDigital Library
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. NeurIPS 30(2017).Google Scholar
- Niklas Wahlström, Thomas B Schön, and Marc Peter Deisenroth. 2015. From pixels to torques: Policy learning with deep dynamical models. arXiv (2015).Google Scholar
- Thomas J Walsh and Michael L Littman. 2008. Efficient learning of action schemas and web-service descriptions. In AAAI, Vol. 8. 714–719.Google Scholar
- William Yang Wang, Jiwei Li, and Xiaodong He. 2018. Deep reinforcement learning for NLP. In ACL: Tutorial Abstracts. 19–21.Google Scholar
- Xuemei Wang. 1996. Learning planning operators by observation and practice. Ph. D. Dissertation. Carnegie Mellon University.Google Scholar
- Christopher John Cornish Hellaby Watkins. 1989. Learning from delayed rewards. Ph. D. Dissertation. King’s College.Google Scholar
- Ben Weber, Michael Mateas, and Arnav Jhala. 2012. Learning from demonstration for goal-driven autonomy. In AAAI, Vol. 26. 1176–1182.Google Scholar
- Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Reinforcement learning(1992), 5–32.Google Scholar
- Qiang Yang, Kangheng Wu, and Yunfei Jiang. 2007. Learning action models from plan examples using weighted MAX-SAT. Artif. Intell. 171, 2-3 (2007), 107–143.Google ScholarDigital Library
- Sungwook Yoon and Subbarao Kambhampati. 2007. Towards model-lite planning: A proposal for learning & planning with incomplete domain models. In ICAPS Workshop on Artificial Intelligence Planning and Learning.Google Scholar
- Sung Wook Yoon, Alan Fern, and Robert Givan. 2006. Learning Heuristic Functions from Relaxed Plans.. In ICAPS, Vol. 2. 3.Google Scholar
- Sung Wook Yoon, Alan Fern, and Robert Givan. 2007. FF-Replan: A Baseline for Probabilistic Planning.. In ICAPS, Vol. 7. 352–359.Google Scholar
- Håkan LS Younes and Michael L Littman. 2004. PPDDL1. 0: An extension to PDDL for expressing planning domains with probabilistic effects. Techn. Rep. CMU-CS-04-162 2 (2004), 99.Google Scholar
- Chao Yu, Xuejing Zheng, Hankz Hankui Zhuo, Hai Wan, and Weilin Luo. 2023. Reinforcement Learning with Knowledge Representation and Reasoning: A Brief Survey. arXiv (2023).Google Scholar
- Vinicius Zambaldi, David Raposo, Adam Santoro, Victor Bapst, Yujia Li, Igor Babuschkin, Karl Tuyls, David Reichert, Timothy Lillicrap, Edward Lockhart, et al. 2019. Deep reinforcement learning with relational inductive biases. In ICLR.Google Scholar
- Hankz Hankui Zhuo and Qiang Yang. 2014. Action-model acquisition for planning via transfer learning. Artif. Intell. 212(2014), 80–103.Google ScholarDigital Library
Index Terms
- A Review of Symbolic, Subsymbolic and Hybrid Methods for Sequential Decision Making
Recommendations
Decision Making Method Based on Projection of Hybrid Vector
FSKD '08: Proceedings of the 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery - Volume 01The problem of hybrid multi-attribute decision making with real number, interval number and fuzzy number, is studied. First of all, the hybrid vector and the projection of hybrid vector are defined, then a new decision making method based on projection ...
Neurosymbolic Integration of Linear Temporal Logic in Non Symbolic Domains
Multi-Agent SystemsAbstractLinear Temporal Logic (LTL) is widely used to specify temporal relationships and dynamic constraints for autonomous agents. However, in order to be used in practice in real-world domains, this high-level knowledge must be grounded in the task ...
A Hybrid Architecture for Situated Learning of Reactive Sequential Decision Making
In developing autonomous agents, one usually emphasizes only (situated) procedural knowledge, ignoring more explicit declarative knowledge. On the other hand, in developing symbolic reasoning models, one usually emphasizes only declarative knowledge, ...
Comments