Effective subgoal discovery and option generation in reinforcement learning

Download

index.pdf

Date

2016

Author

Demir, Alper

Metadata

Show full item record

Item Usage Stats

267
views

92
downloads

Subgoal discovery is proven to be a practical way to cope with large state spaces in Reinforcement Learning. Subgoals are natural hints to partition the problem into sub-problems, allowing the agent to solve each sub-problem separately. Identification of such subgoal states in the early phases of the learning process increases the learning speed of the agent. In a problem modeled as a Markov Decision Process, subgoal states possess key features that distinguish them from the ordinary ones. A learning agent needs a way to reach an identified subgoal, and this can be achieved by forming an option to reach it. Most of the studies in the literature focus on finding useful subgoals by employing statistical methods and graph-based methods. On the other hand, there are few studies working on how to improve the process of forming options. In this thesis, an efficient subgoal discovery making use of local information is proposed. Unlike other methods, it has lower time complexity and does not require additional problem specific parameters. Furthermore, a better heuristic for forming options is proposed. It focuses on collecting a set of states that an option is really useful to employ from, leading to more effective options.

Subject Keywords

Machine learning., Artificial intelligence., Reinforcement learning.

URI

http://etd.lib.metu.edu.tr/upload/12620214/index.pdf
https://hdl.handle.net/11511/25875

Collections

Graduate School of Natural and Applied Sciences, Thesis

Suggestions

OpenMETU
Core

Automatic identification of transitional bottlenecks in reinforcement learning under partial observability Aydın, Hüseyin; Polat, Faruk; Department of Computer Engineering (2017) Instance-based methods are proven tools to solve reinforcement learning problems with hidden states. Nearest Sequence Memory (NSM) is a widely known instance-based approach mainly based on k-Nearest Neighbor algorithm. NSM keeps track of raw history of action-observation-reward instances within a fixed length (or ideally unlimited) memory. It calculates the neighborhood for the current state through a recursive comparison of the matching action-observation-reward tuples with the previous ones. The ones with...
A Heuristic temporal difference approach with adaptive grid discretization Fikir, Ozan Bora; Polat, Faruk; Department of Computer Engineering (2016) Reinforcement learning (RL), as an area of machine learning, tackle with the problem defined in an environment where an autonomous agent ought to take actions to achieve an ultimate goal. In RL problems, the environment is typically formulated as a Markov decision process. However, in real life problems, the environment is not flawless to be formulated as an MDP, and we need to relax fully observability assumption of MDP. The resulting model is partially observable Markov decision process, which is a more r...
Factored reinforcement learning using extended sequence trees Şahin, Coşkun; Polat, Faruk; Department of Computer Engineering (2015) Reinforcement Learning (RL) is an area concerned with learning how to act in an environment to reach a final state while gaining maximum amount of reward. Markov Decision Process (MDP) is the formal framework to define an RL task. In addition to different techniques proposed to solve MDPs, there are several studies to improve RL algorithms. Because these methods are often inadequate for real-world problems. Classical approaches require enumeration of all possible states to find a solution. But when states a...
Reinforcement learning control for autorotation of a simple point-mass helicopter model Kopşa, Kadircan; Kutay, Ali Türker; Department of Aerospace Engineering (2018) This study presents an application of an actor-critic reinforcement learning method to a simple point-mass model helicopter guidance problem during autorotation. A point-mass model of an OH-58A helicopter in autorotation was built. A reinforcement learning agent was trained by a model-free asynchronous actor-critic algorithm, where training episodes were parallelized on a multi-core CPU. Objective of the training was defined as achieving near-zero horizontal and vertical kinetic energies at the instant of t...
Simple and complex behavior learning using behavior hidden Markov Model and CobART Seyhan, Seyit Sabri; Alpaslan, Ferda Nur; Department of Computer Engineering (2013) In this thesis, behavior learning and generation models are proposed for simple and complex behaviors of robots using unsupervised learning methods. Simple behaviors are modeled by simple-behavior learning model (SBLM) and complex behaviors are modeled by complex-behavior learning model (CBLM) which uses previously learned simple or complex behaviors. Both models have common phases named behavior categorization, behavior modeling, and behavior generation. Sensory data are categorized using correlation based...

Citation Formats

A. Demir, “Effective subgoal discovery and option generation in reinforcement learning,” M.S. - Master of Science, Middle East Technical University, 2016.