Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
Landmark based guidance for reinforcement learning agents under partial observability
Date
2022-01-01
Author
Demir, Alper
Çilden, Erkin
Polat, Faruk
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
99
views
0
downloads
Cite This
Under partial observability, a reinforcement learning agent needs to estimate its true state by solely using its observation semantics. However, this interpretation has a drawback, which is called perceptual aliasing, avoiding the convergence guarantee of the learning algorithm. To overcome this issue, the state estimates are formed by the recent experiences of the agent, which can be formulated as a form of memory. Although the state estimates may still yield ambiguous action mappings due to aliasing, some estimates exist that naturally disambiguate the present situation of the agent in the domain. This paper introduces an algorithm that incorporates a guidance mechanism to accelerate reinforcement learning for partially observable problems with hidden states. The algorithm makes use of the landmarks of the problem, namely the distinctive and reliable experiences in the state estimates context within an ambiguous environment. The proposed algorithm constructs an abstract transition model by utilizing the landmarks observed, calculates their potentials throughout learning -as a mechanism borrowed from reward shaping-, and concurrently applies the potentials to provide guiding rewards for the agent. Additionally, we employ a known multiple instance learning method, diverse density, for automatically discovering landmarks before learning, and combine both algorithms to form a unified framework. The effectiveness of the algorithms is empirically shown via extensive experimentation. The results show that the proposed framework not only accelerates the underlying reinforcement learning methods, but also finds better policies for representative benchmark problems.
Subject Keywords
Diverse density
,
Landmark based guidance
,
Partial observability
,
Reinforcement learning
,
TEMPORAL ABSTRACTION
,
FRAMEWORK
,
Diverse density
,
Landmark based guidance
,
Partial observability
,
Reinforcement learning
URI
https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85141991253&origin=inward
https://hdl.handle.net/11511/101698
Journal
International Journal of Machine Learning and Cybernetics
DOI
https://doi.org/10.1007/s13042-022-01713-5
Collections
Department of Computer Engineering, Article
Suggestions
OpenMETU
Core
EMDD-RL: faster subgoal identification with diverse density in reinforcement learning
Sunel, Saim; Polat, Faruk; Department of Computer Engineering (2021-1-15)
Diverse Density (DD) algorithm is a well-known multiple instance learning method, also known to be effective to automatically identify sub-goals and improve Reinforcement Learning (RL). Expectation-Maximization Diverse Density (EMDD) improves DD in terms of both speed and accuracy. This study adapts EMDD to automatically identify subgoals for RL which is shown to perform significantly faster (3 to 10 times) than its predecessor, without sacrificing solution quality. The performance of the proposed method na...
Using chains of bottleneck transitions to decompose and solve reinforcement learning tasks with hidden states
Aydın, Hüseyin; Çilden, Erkin; Polat, Faruk (2022-08-01)
Reinforcement learning is known to underperform in large and ambiguous problem domains under partial observability. In such cases, a proper decomposition of the task can improve and accelerate the learning process. Even ambiguous and complex problems that are not solvable by conventional methods turn out to be easier to handle by using a convenient problem decomposition, followed by the incorporation of machine learning methods for the sub-problems. Like in most real-life problems, the decomposition of a ta...
Improving reinforcement learning using distinctive clues of the environment
Demir, Alper; Polat, Faruk; Department of Computer Engineering (2019)
Effective decomposition and abstraction has been shown to improve the performance of Reinforcement Learning. An agent can use the clues from the environment to either partition the problem into sub-problems or get informed about its progress in a given task. In a fully observable environment such clues may come from subgoals while in a partially observable environment they may be provided by unique experiences. The contribution of this thesis is two fold; first improvements over automatic subgoal identifica...
Compact Frequency Memory for Reinforcement Learning with Hidden States.
Polat, Faruk; Cilden, Erkin (2019-10-28)
Memory-based reinforcement learning approaches keep track of past experiences of the agent in environments with hidden states. This may require extensive use of memory that limits the practice of these methods in a real-life problem. The motivation behind this study is the observation that less frequent transitions provide more reliable information about the current state of the agent in ambiguous environments. In this work, a selective memory approach based on the frequencies of transitions is proposed to ...
A Concept Filtering Approach for Diverse Density to Discover Subgoals in Reinforcement Learning
DEMİR, ALPER; Cilden, Erkin; Polat, Faruk (2017-11-08)
In the reinforcement learning context, subgoal discovery methods aim to find bottlenecks in problem state space so that the problem can naturally be decomposed into smaller subproblems. In this paper, we propose a concept filtering method that extends an existing subgoal discovery method, namely diverse density, to be used for both fully and partially observable RL problems. The proposed method is successful in discovering useful subgoals with the help of multiple instance learning. Compared to the original...
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
A. Demir, E. Çilden, and F. Polat, “Landmark based guidance for reinforcement learning agents under partial observability,”
International Journal of Machine Learning and Cybernetics
, pp. 0–0, 2022, Accessed: 00, 2023. [Online]. Available: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85141991253&origin=inward.