Automatic identification of transitional bottlenecks in reinforcement learning under partial observability

Aydın, Hüseyin
Instance-based methods are proven tools to solve reinforcement learning problems with hidden states. Nearest Sequence Memory (NSM) is a widely known instance-based approach mainly based on k-Nearest Neighbor algorithm. NSM keeps track of raw history of action-observation-reward instances within a fixed length (or ideally unlimited) memory. It calculates the neighborhood for the current state through a recursive comparison of the matching action-observation-reward tuples with the previous ones. The ones with the highest short-term history overlap are assumed to be the nearest neighbors of the current state, and the information required to lead the agent to the highest expected reward are extracted via averaging the action values among the matches. In this thesis, an indexing method is proposed to avoid redundant comparisons of tuples to identify matching histories for NSM, via a hashing mechanism. Experiments show that a significant improvement has been achieved in terms of time complexity while the learning performance is preserved. Furthermore, an improving heuristic is proposed for the NSM algorithm which provides the agent an additional prior information, namely transitional bottlenecks, on the way to goal. Additionally, a tuple extension pattern is shown to further improve the heuristic by means of ambiguity reduction due to the nature of transitional bottlenecks, thus increase the learning speed. Finally, NSM is combined with Diverse Density, so that identification of useful transitional bottlenecks can be automatized. The experimentation shows that this combination achieves to find the qualified transitional bottlenecks without any prior information fed into agent. 
Citation Formats
H. Aydın, “Automatic identification of transitional bottlenecks in reinforcement learning under partial observability,” M.S. - Master of Science, Middle East Technical University, 2017.