Factored reinforcement learning using extended sequence trees

Download
2015
Şahin, Coşkun
Reinforcement Learning (RL) is an area concerned with learning how to act in an environment to reach a final state while gaining maximum amount of reward. Markov Decision Process (MDP) is the formal framework to define an RL task. In addition to different techniques proposed to solve MDPs, there are several studies to improve RL algorithms. Because these methods are often inadequate for real-world problems. Classical approaches require enumeration of all possible states to find a solution. But when states are described by a number of features in the environment, state space grows exponentially, which is known as curse of dimensionality. It is possible to model environments more compactly by taking advantage of new representation. Factored Markov Decision Processes (FMDPs) are used for this purpose and on top of this structure, Factored Reinforcement Learning (FRL) methods are applied to utilize new structured representation. Furthermore, this approach may not be sufficient for large scale problems. Since there are a huge number of states and actions to consider, learning process requires more time and resources. In this thesis, we propose a compact factored structure to solve this problem. Automatic detection and use of temporal abstractions during learning is proven to be an effective way to increase learning speed. Repeating patterns are found in different parts of the problem and a common sub-policy is used for all of them without exploring the solution again and again. Extended Sequence Tree (EST) algorithm is an automatic temporal abstraction detection technique that uses history of states and actions to store frequently used patterns in a structured manner and offers alternative actions to the underlying RL algorithm. In this work, we propose a factored automatic temporal abstraction method based on extended sequence tree by taking care of state differences via state variable changes in successive states. The aim is to store useful history portions more compactly to avoid excessive memory usage. The proposed method has been shown to provide significant memory gain on selected benchmark problems.

Suggestions

A Heuristic temporal difference approach with adaptive grid discretization
Fikir, Ozan Bora; Polat, Faruk; Department of Computer Engineering (2016)
Reinforcement learning (RL), as an area of machine learning, tackle with the problem defined in an environment where an autonomous agent ought to take actions to achieve an ultimate goal. In RL problems, the environment is typically formulated as a Markov decision process. However, in real life problems, the environment is not flawless to be formulated as an MDP, and we need to relax fully observability assumption of MDP. The resulting model is partially observable Markov decision process, which is a more r...
Effective subgoal discovery and option generation in reinforcement learning
Demir, Alper; Polat, Faruk; Department of Computer Engineering (2016)
Subgoal discovery is proven to be a practical way to cope with large state spaces in Reinforcement Learning. Subgoals are natural hints to partition the problem into sub-problems, allowing the agent to solve each sub-problem separately. Identification of such subgoal states in the early phases of the learning process increases the learning speed of the agent. In a problem modeled as a Markov Decision Process, subgoal states possess key features that distinguish them from the ordinary ones. A learning agent ...
Automatic identification of transitional bottlenecks in reinforcement learning under partial observability
Aydın, Hüseyin; Polat, Faruk; Department of Computer Engineering (2017)
Instance-based methods are proven tools to solve reinforcement learning problems with hidden states. Nearest Sequence Memory (NSM) is a widely known instance-based approach mainly based on k-Nearest Neighbor algorithm. NSM keeps track of raw history of action-observation-reward instances within a fixed length (or ideally unlimited) memory. It calculates the neighborhood for the current state through a recursive comparison of the matching action-observation-reward tuples with the previous ones. The ones with...
Improving reinforcement learning using distinctive clues of the environment
Demir, Alper; Polat, Faruk; Department of Computer Engineering (2019)
Effective decomposition and abstraction has been shown to improve the performance of Reinforcement Learning. An agent can use the clues from the environment to either partition the problem into sub-problems or get informed about its progress in a given task. In a fully observable environment such clues may come from subgoals while in a partially observable environment they may be provided by unique experiences. The contribution of this thesis is two fold; first improvements over automatic subgoal identifica...
Effect of human prior knowledge on game success and comparison with reinforcement learning
Hasanoğlu, Mert.; Çakır, Murat Perit; Department of Cognitive Sciences (2019)
This study aims to find out the effect of prior knowledge on the success of humans in a non-rewarding game environment, and then to compare human performance with a reinforcement learning method in an effort to observe to what extent this method can be brought closer to human behavior and performance with the data obtained. For this purpose, different versions of a simple 2D game were used, and data were collected from 32 participants. At the end of the experiment, it is concluded that prior knowledge, such...
Citation Formats
C. Şahin, “Factored reinforcement learning using extended sequence trees,” M.S. - Master of Science, Middle East Technical University, 2015.