Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
Using chains of bottleneck transitions to decompose and solve reinforcement learning tasks with hidden states
Date
2022-08-01
Author
Aydın, Hüseyin
Çilden, Erkin
Polat, Faruk
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
249
views
0
downloads
Cite This
Reinforcement learning is known to underperform in large and ambiguous problem domains under partial observability. In such cases, a proper decomposition of the task can improve and accelerate the learning process. Even ambiguous and complex problems that are not solvable by conventional methods turn out to be easier to handle by using a convenient problem decomposition, followed by the incorporation of machine learning methods for the sub-problems. Like in most real-life problems, the decomposition of a task usually stems from the sequence of sub-tasks that must be achieved in order to get the main task done. In this study, assuming that unambiguous states are provided in advance, a decomposition of the problem is constructed by the agent based on a set of chains of bottleneck transitions, which are sequences of unambiguous and critical transitions leading to the goal state. At the higher level, an agent trains its sub-agents to extract sub-policies corresponding to the sub-tasks, namely two successive transitions in any chain, and learns the value of each sub-policy at the abstract level. Experimental study demonstrates that an early decomposition based on useful bottleneck transitions eliminates the necessity for excessive memory and improves the learning performance of the agent. It is also shown that knowing the correct order of bottleneck transitions in the decomposition results in faster construction of the solution.
Subject Keywords
Chains of bottleneck transitions
,
Reinforcement learning
,
Task decomposition
URI
https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85126917767&origin=inward
https://hdl.handle.net/11511/97170
Journal
Future Generation Computer Systems
DOI
https://doi.org/10.1016/j.future.2022.03.016
Collections
Department of Computer Engineering, Article
Suggestions
OpenMETU
Core
Using Transitional Bottlenecks to Improve Learning in Nearest Sequence Memory Algorithm
Aydın, Hüseyin; Polat, Faruk (2017-11-08)
Instance-based methods are proven tools to solve reinforcement learning problems with hidden states. Nearest Sequence Memory (NSM) is a widely known instance-based approach mainly based on k-Nearest Neighbor algorithm. It keeps the history of the agent in terms of action-observation-reward tuples and uses it to vote for the best upcoming action. In this work, an improving heuristic is proposed for the NSM algorithm which provides the agent an additional prior information, namely transitional bottlenecks, on...
Using Frequencies of Transitions to Improve Reinforcement Learning with Hidden States
Aydın, Hüseyin; Polat, Faruk; Çilden, Erkin; Department of Computer Engineering (2022-8)
Reinforcement learning problems with hidden states suffer from the ambiguity of the environment, since the ambiguity in the agent's perception may prevent the agent from estimating its current state correctly. Therefore, constructing a solution without using an external memory may be extremely difficult or even impossible sometimes. In an ambiguous environment, frequencies of the transitions can provide more reliable information and hence it may lead us to construct more efficient and effective memory inst...
A Concept Filtering Approach for Diverse Density to Discover Subgoals in Reinforcement Learning
DEMİR, ALPER; Cilden, Erkin; Polat, Faruk (2017-11-08)
In the reinforcement learning context, subgoal discovery methods aim to find bottlenecks in problem state space so that the problem can naturally be decomposed into smaller subproblems. In this paper, we propose a concept filtering method that extends an existing subgoal discovery method, namely diverse density, to be used for both fully and partially observable RL problems. The proposed method is successful in discovering useful subgoals with the help of multiple instance learning. Compared to the original...
Improving reinforcement learning using distinctive clues of the environment
Demir, Alper; Polat, Faruk; Department of Computer Engineering (2019)
Effective decomposition and abstraction has been shown to improve the performance of Reinforcement Learning. An agent can use the clues from the environment to either partition the problem into sub-problems or get informed about its progress in a given task. In a fully observable environment such clues may come from subgoals while in a partially observable environment they may be provided by unique experiences. The contribution of this thesis is two fold; first improvements over automatic subgoal identifica...
Local Roots A Tree Based Subgoal Discovery Method to Accelerate Reinforcement Learning
Demir, Alper; Polat, Faruk; Cilden, Erkin (2016-12-04)
Subgoal discovery in reinforcement learning is an effective way of partitioning a problem domain with large state space. Recent research mainly focuses on automatic identification of such subgoals during learning, making use of state transition information gathered during exploration. Mostly based on the options framework, an identified subgoal leads the learning agent to an intermediate region which is known to be useful on the way to goal. In this paper, we propose a novel automatic subgoal discovery meth...
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
H. Aydın, E. Çilden, and F. Polat, “Using chains of bottleneck transitions to decompose and solve reinforcement learning tasks with hidden states,”
Future Generation Computer Systems
, vol. 133, pp. 153–168, 2022, Accessed: 00, 2022. [Online]. Available: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85126917767&origin=inward.