Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
Reward Shaping for Efficient Exploration and Acceleration of Learning in Reinforcement Learning
Download
10480517.pdf
Date
2022-7-21
Author
Bal, Melis İlayda
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
605
views
143
downloads
Cite This
In a Reinforcement Learning task, a learning agent needs to extract useful information about its uncertain environment in an efficient way during the interaction process to successfully complete the task. Through strategic exploration, the agent acquires sufficient information to adjust its behavior to act intelligently as it interacts with the environment. Therefore, efficient exploration plays a key role in the learning efficiency of Reinforcement Learning tasks. Due to the delayed-feedback nature of Reinforcement Learning settings with sparse explicit reward structure, the required time for learning becomes the main cause of learning inefficiency. This problem is exacerbated particularly in complex tasks with large state and action spaces. Decomposing the task or modifying the reward structure to provide frequent feedback to the agent has been shown to accelerate learning. This thesis proposes two methods with a reward shaping mechanism to address the aforementioned problems. To attack the efficient exploration problem, a framework called population-based repulsive reward shaping mechanism using eligibility traces is proposed under the scope of tabular RL representation. The computational study on benchmark problem domains showed that efficient exploration is achieved with a significant improvement in learning and state-space coverage with the proposed framework. Furthermore, to accelerate learning, the thesis also proposes an approach called potential-based reward shaping using state-space segmentation with the extended segmented Q-Cut algorithm. Experimental results on sparse-reward benchmark domains showed that the proposed method indeed speeds up the learning of the agent without sacrificing computation time.
Subject Keywords
reinforcement learning
,
coordinated exploration
,
eligibility traces
,
potential-based reward shaping
,
state-space segmentation
URI
https://hdl.handle.net/11511/98151
Collections
Graduate School of Natural and Applied Sciences, Thesis
Suggestions
OpenMETU
Core
Bipedal Robot Walking by Reinforcement Learning in Partially Observed Environment
Özalp, Uğurcan; Uğur, Ömür; Department of Scientific Computing (2021-8-27)
Deep Reinforcement Learning methods on mechanical control have been successfully applied in many environments and used instead of traditional optimal and adaptive control methods for some complex problems. However, Deep Reinforcement Learning algorithms do still have some challenges. One is to control on partially observable environments. When an agent is not informed well of the environment, it must recover information from the past observations. In this thesis, walking of Bipedal Walker Hardcore (Open...
Online collaboration: Collaborative behavior patterns and factors affecting globally distributed team performance
Serce, Fatma Cemile; Swigger, Kathleen; Alpaslan, Ferda Nur; Brazile, Robert; Dafoulas, George; Lopez, Victor (2011-01-01)
Studying the collaborative behavior of online learning teams and how this behavior is related to communication mode and task type is a complex process. Research about small group learning suggests that a higher percentage of social interactions occur in synchronous rather than asynchronous mode, and that students spend more time in task-oriented interaction in asynchronous discussions than in synchronous mode. This study analyzed the collaborative interaction patterns of global software development learning...
Positive impact of state similarity on reinforcement learning performance
Girgin, Sertan; Polat, Faruk; Alhaj, Reda (Institute of Electrical and Electronics Engineers (IEEE), 2007-10-01)
In this paper, we propose a novel approach to identify states with similar subpolicies and show how they can be integrated into the reinforcement learning framework to improve learning performance. The method utilizes a specialized tree structure to identify common action sequences of states, which are derived from possible optimal policies, and defines a similarity function between two states based on the number of such sequences. Using this similarity function, updates on the action-value function of a st...
Relational-Grid-World: A Novel Relational Reasoning Environment and An Agent Model for Relational Information Extraction
Kucuksubasi, Faruk; Sürer, Elif (2020-07-01)
Reinforcement learning (RL) agents are often designed specifically for a particular problem and they generally have uninterpretable working processes. Statistical methods-based agent algorithms can be improved in terms of generalizability and interpretability using symbolic Artificial Intelligence (AI) tools such as logic programming. In this study, we present a model-free RL architecture that is supported with explicit relational representations of the environmental objects. For the first time, we use the ...
Linearization-based attitude error regulation: multiplicative error case
Doruk, R. Ozgur (2009-01-01)
Purpose - The purpose of this paper is to design and simulate a linearized attitude stabilizer based on linear quadratic regulator theory (LQR) using the multiplicative definition of the attitude.
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
M. İ. Bal, “Reward Shaping for Efficient Exploration and Acceleration of Learning in Reinforcement Learning,” M.S. - Master of Science, Middle East Technical University, 2022.