Using Frequencies of Transitions to Improve Reinforcement Learning with Hidden States

2022-8
Aydın, Hüseyin
Reinforcement learning problems with hidden states suffer from the ambiguity of the environment, since the ambiguity in the agent's perception may prevent the agent from estimating its current state correctly. Therefore, constructing a solution without using an external memory may be extremely difficult or even impossible sometimes. In an ambiguous environment, frequencies of the transitions can provide more reliable information and hence it may lead us to construct more efficient and effective memory instead of keeping all experiences of the agent like the existing memory-based methods. Inspired by this observation, a selective memory approach based on the frequencies of transitions is proposed in the first part of thesis. The agents with any reinforcement learning method can be equipped with this selective memory, since the memory itself does not have any constraints on the underlying method. Experiments show that a compact and selective memory may improve and speed up the learning on both Q-Learning and Sarsa(λ) methods. As the second part of the work, sequential association between transitions is used in order to get a solution in more abstract manner for the problems which can be decomposed by using the bottlenecks in the environment. A simple recursive method is proposed for automatic extraction the set of chains of bottleneck transitions which are sequences of unambiguous and critical transitions leading to the goal state. At the higher level, an agent trains its sub-agents to extract sub-policies corresponding to the sub-tasks, namely two successive transitions in any chain, and learns the value of each sub-policy at the abstract level. Experimentation shows that this approach learns better and faster in the ambiguous domains where conventional methods fail to construct a solution. Furthermore, it has been shown that our method with its early decomposition approach performs better than a memory-based method in terms of quality of the learning, speed and memory usage. Finally, Diverse Density method is integrated with the proposed method to complete the autonomy of the overall process. Although, identified landmarks are not completely accurate, experimentation shows that the results are promising.

Suggestions

Using chains of bottleneck transitions to decompose and solve reinforcement learning tasks with hidden states
Aydın, Hüseyin; Çilden, Erkin; Polat, Faruk (2022-08-01)
Reinforcement learning is known to underperform in large and ambiguous problem domains under partial observability. In such cases, a proper decomposition of the task can improve and accelerate the learning process. Even ambiguous and complex problems that are not solvable by conventional methods turn out to be easier to handle by using a convenient problem decomposition, followed by the incorporation of machine learning methods for the sub-problems. Like in most real-life problems, the decomposition of a ta...
Compact Frequency Memory for Reinforcement Learning with Hidden States.
Polat, Faruk; Cilden, Erkin (2019-10-28)
Memory-based reinforcement learning approaches keep track of past experiences of the agent in environments with hidden states. This may require extensive use of memory that limits the practice of these methods in a real-life problem. The motivation behind this study is the observation that less frequent transitions provide more reliable information about the current state of the agent in ambiguous environments. In this work, a selective memory approach based on the frequencies of transitions is proposed to ...
Using Transitional Bottlenecks to Improve Learning in Nearest Sequence Memory Algorithm
Aydın, Hüseyin; Polat, Faruk (2017-11-08)
Instance-based methods are proven tools to solve reinforcement learning problems with hidden states. Nearest Sequence Memory (NSM) is a widely known instance-based approach mainly based on k-Nearest Neighbor algorithm. It keeps the history of the agent in terms of action-observation-reward tuples and uses it to vote for the best upcoming action. In this work, an improving heuristic is proposed for the NSM algorithm which provides the agent an additional prior information, namely transitional bottlenecks, on...
A method for detecting RGPO/VGPO jamming
Kural, F; Ozkanzanc, Y (2004-04-30)
This paper proposes a general approach to detect the presence of deceptive counter measures such as RGPO or VGPO. The method which is based on Kalman filter lays its foundation on drift between Doppler measurement and the Doppler information obtained from the range measurement of a target in track. By measuring this drift by Mahalonobis distance metric and comparing it with a predefined threshold; detection is performed. The performance of the proposed approach is illustrated through a simulation.
A parallel between regret theory and outranking methods for multicriteria decision making under imprecise information
Ozerol, Gul; Karasakal, Esra (Springer Science and Business Media LLC, 2008-08-01)
Incorporation of the behavioral issues of the decision maker (DM) is among the aspects that each Multicriteria Decision Making (MCDM) method implicitly or explicitly takes into account. As postulated by regret theory, the feelings of regret and rejoice are among the behavioral issues associated with the entire decision making process. Within the context of MCDM, the DM may feel regret, when the chosen alternative is compared with another one having at least one better criterion value. PROMETHEE II is a wide...
Citation Formats
H. Aydın, “Using Frequencies of Transitions to Improve Reinforcement Learning with Hidden States,” Ph.D. - Doctoral Program, Middle East Technical University, 2022.