Reward Shaping for Efficient Exploration and Acceleration of Learning in Reinforcement Learning

Download

10480517.pdf

Date

2022-7-21

Author

Bal, Melis İlayda

Metadata

Show full item record

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Item Usage Stats

678
views

168
downloads

In a Reinforcement Learning task, a learning agent needs to extract useful information about its uncertain environment in an efficient way during the interaction process to successfully complete the task. Through strategic exploration, the agent acquires sufficient information to adjust its behavior to act intelligently as it interacts with the environment. Therefore, efficient exploration plays a key role in the learning efficiency of Reinforcement Learning tasks. Due to the delayed-feedback nature of Reinforcement Learning settings with sparse explicit reward structure, the required time for learning becomes the main cause of learning inefficiency. This problem is exacerbated particularly in complex tasks with large state and action spaces. Decomposing the task or modifying the reward structure to provide frequent feedback to the agent has been shown to accelerate learning. This thesis proposes two methods with a reward shaping mechanism to address the aforementioned problems. To attack the efficient exploration problem, a framework called population-based repulsive reward shaping mechanism using eligibility traces is proposed under the scope of tabular RL representation. The computational study on benchmark problem domains showed that efficient exploration is achieved with a significant improvement in learning and state-space coverage with the proposed framework. Furthermore, to accelerate learning, the thesis also proposes an approach called potential-based reward shaping using state-space segmentation with the extended segmented Q-Cut algorithm. Experimental results on sparse-reward benchmark domains showed that the proposed method indeed speeds up the learning of the agent without sacrificing computation time.

Subject Keywords

reinforcement learning, coordinated exploration, eligibility traces, potential-based reward shaping, state-space segmentation

URI

https://hdl.handle.net/11511/98151

Collections

Graduate School of Natural and Applied Sciences, Thesis

Suggestions

OpenMETU
Core

Bipedal Robot Walking by Reinforcement Learning in Partially Observed Environment Özalp, Uğurcan; Uğur, Ömür; Department of Scientific Computing (2021-8-27) Deep Reinforcement Learning methods on mechanical control have been successfully applied in many environments and used instead of traditional optimal and adaptive control methods for some complex problems. However, Deep Reinforcement Learning algorithms do still have some challenges. One is to control on partially observable environments. When an agent is not informed well of the environment, it must recover information from the past observations. In this thesis, walking of Bipedal Walker Hardcore (Open...
Online collaboration: Collaborative behavior patterns and factors affecting globally distributed team performance Serce, Fatma Cemile; Swigger, Kathleen; Alpaslan, Ferda Nur; Brazile, Robert; Dafoulas, George; Lopez, Victor (2011-01-01) Studying the collaborative behavior of online learning teams and how this behavior is related to communication mode and task type is a complex process. Research about small group learning suggests that a higher percentage of social interactions occur in synchronous rather than asynchronous mode, and that students spend more time in task-oriented interaction in asynchronous discussions than in synchronous mode. This study analyzed the collaborative interaction patterns of global software development learning...
Positive impact of state similarity on reinforcement learning performance Girgin, Sertan; Polat, Faruk; Alhaj, Reda (Institute of Electrical and Electronics Engineers (IEEE), 2007-10-01) In this paper, we propose a novel approach to identify states with similar subpolicies and show how they can be integrated into the reinforcement learning framework to improve learning performance. The method utilizes a specialized tree structure to identify common action sequences of states, which are derived from possible optimal policies, and defines a similarity function between two states based on the number of such sequences. Using this similarity function, updates on the action-value function of a st...
Relational-Grid-World: A Novel Relational Reasoning Environment and An Agent Model for Relational Information Extraction Kucuksubasi, Faruk; Sürer, Elif (2020-07-01) Reinforcement learning (RL) agents are often designed specifically for a particular problem and they generally have uninterpretable working processes. Statistical methods-based agent algorithms can be improved in terms of generalizability and interpretability using symbolic Artificial Intelligence (AI) tools such as logic programming. In this study, we present a model-free RL architecture that is supported with explicit relational representations of the environmental objects. For the first time, we use the ...
Linearization-based attitude error regulation: multiplicative error case Doruk, R. Ozgur (2009-01-01) Purpose - The purpose of this paper is to design and simulate a linearized attitude stabilizer based on linear quadratic regulator theory (LQR) using the multiplicative definition of the attitude.

Citation Formats

M. İ. Bal, “Reward Shaping for Efficient Exploration and Acceleration of Learning in Reinforcement Learning,” M.S. - Master of Science, Middle East Technical University, 2022.