Automatic identification of transitional bottlenecks in reinforcement learning under partial observability

Download
2017
Aydın, Hüseyin
Instance-based methods are proven tools to solve reinforcement learning problems with hidden states. Nearest Sequence Memory (NSM) is a widely known instance-based approach mainly based on k-Nearest Neighbor algorithm. NSM keeps track of raw history of action-observation-reward instances within a fixed length (or ideally unlimited) memory. It calculates the neighborhood for the current state through a recursive comparison of the matching action-observation-reward tuples with the previous ones. The ones with the highest short-term history overlap are assumed to be the nearest neighbors of the current state, and the information required to lead the agent to the highest expected reward are extracted via averaging the action values among the matches. In this thesis, an indexing method is proposed to avoid redundant comparisons of tuples to identify matching histories for NSM, via a hashing mechanism. Experiments show that a significant improvement has been achieved in terms of time complexity while the learning performance is preserved. Furthermore, an improving heuristic is proposed for the NSM algorithm which provides the agent an additional prior information, namely transitional bottlenecks, on the way to goal. Additionally, a tuple extension pattern is shown to further improve the heuristic by means of ambiguity reduction due to the nature of transitional bottlenecks, thus increase the learning speed. Finally, NSM is combined with Diverse Density, so that identification of useful transitional bottlenecks can be automatized. The experimentation shows that this combination achieves to find the qualified transitional bottlenecks without any prior information fed into agent. 

Suggestions

Using Transitional Bottlenecks to Improve Learning in Nearest Sequence Memory Algorithm
Aydın, Hüseyin; Polat, Faruk (2017-11-08)
Instance-based methods are proven tools to solve reinforcement learning problems with hidden states. Nearest Sequence Memory (NSM) is a widely known instance-based approach mainly based on k-Nearest Neighbor algorithm. It keeps the history of the agent in terms of action-observation-reward tuples and uses it to vote for the best upcoming action. In this work, an improving heuristic is proposed for the NSM algorithm which provides the agent an additional prior information, namely transitional bottlenecks, on...
Abstraction in reinforcement learning in partially observable environments
Çilden, Erkin; Polat, Faruk; Department of Computer Engineering (2014)
Reinforcement learning defines a prominent family of unsupervised machine learning methods in autonomous agents perspective. Markov decision process model provides a solid formal basis for reinforcement learning algorithms. Temporal abstraction mechanisms can be built on reinforcement learning and significant performance gain can be achieved. If the full observability assumption of Markov decision process model is relaxed, the resulting model is partially observable Markov decision process, which constitute...
Effective subgoal discovery and option generation in reinforcement learning
Demir, Alper; Polat, Faruk; Department of Computer Engineering (2016)
Subgoal discovery is proven to be a practical way to cope with large state spaces in Reinforcement Learning. Subgoals are natural hints to partition the problem into sub-problems, allowing the agent to solve each sub-problem separately. Identification of such subgoal states in the early phases of the learning process increases the learning speed of the agent. In a problem modeled as a Markov Decision Process, subgoal states possess key features that distinguish them from the ordinary ones. A learning agent ...
A Heuristic temporal difference approach with adaptive grid discretization
Fikir, Ozan Bora; Polat, Faruk; Department of Computer Engineering (2016)
Reinforcement learning (RL), as an area of machine learning, tackle with the problem defined in an environment where an autonomous agent ought to take actions to achieve an ultimate goal. In RL problems, the environment is typically formulated as a Markov decision process. However, in real life problems, the environment is not flawless to be formulated as an MDP, and we need to relax fully observability assumption of MDP. The resulting model is partially observable Markov decision process, which is a more r...
Simple and complex behavior learning using behavior hidden Markov Model and CobART
Seyhan, Seyit Sabri; Alpaslan, Ferda Nur; Department of Computer Engineering (2013)
In this thesis, behavior learning and generation models are proposed for simple and complex behaviors of robots using unsupervised learning methods. Simple behaviors are modeled by simple-behavior learning model (SBLM) and complex behaviors are modeled by complex-behavior learning model (CBLM) which uses previously learned simple or complex behaviors. Both models have common phases named behavior categorization, behavior modeling, and behavior generation. Sensory data are categorized using correlation based...
Citation Formats
H. Aydın, “Automatic identification of transitional bottlenecks in reinforcement learning under partial observability,” M.S. - Master of Science, Middle East Technical University, 2017.