Factored reinforcement learning using extended sequence trees

Şahin, Coşkun
Reinforcement Learning (RL) is an area concerned with learning how to act in an environment to reach a final state while gaining maximum amount of reward. Markov Decision Process (MDP) is the formal framework to define an RL task. In addition to different techniques proposed to solve MDPs, there are several studies to improve RL algorithms. Because these methods are often inadequate for real-world problems. Classical approaches require enumeration of all possible states to find a solution. But when states are described by a number of features in the environment, state space grows exponentially, which is known as curse of dimensionality. It is possible to model environments more compactly by taking advantage of new representation. Factored Markov Decision Processes (FMDPs) are used for this purpose and on top of this structure, Factored Reinforcement Learning (FRL) methods are applied to utilize new structured representation. Furthermore, this approach may not be sufficient for large scale problems. Since there are a huge number of states and actions to consider, learning process requires more time and resources. In this thesis, we propose a compact factored structure to solve this problem. Automatic detection and use of temporal abstractions during learning is proven to be an effective way to increase learning speed. Repeating patterns are found in different parts of the problem and a common sub-policy is used for all of them without exploring the solution again and again. Extended Sequence Tree (EST) algorithm is an automatic temporal abstraction detection technique that uses history of states and actions to store frequently used patterns in a structured manner and offers alternative actions to the underlying RL algorithm. In this work, we propose a factored automatic temporal abstraction method based on extended sequence tree by taking care of state differences via state variable changes in successive states. The aim is to store useful history portions more compactly to avoid excessive memory usage. The proposed method has been shown to provide significant memory gain on selected benchmark problems.