Positive impact of state similarity on reinforcement learning performance

Date

2007-10-01

Author

Girgin, Sertan
Polat, Faruk
Alhaj, Reda

Metadata

Show full item record

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Item Usage Stats

262
views

0
downloads

In this paper, we propose a novel approach to identify states with similar subpolicies and show how they can be integrated into the reinforcement learning framework to improve learning performance. The method utilizes a specialized tree structure to identify common action sequences of states, which are derived from possible optimal policies, and defines a similarity function between two states based on the number of such sequences. Using this similarity function, updates on the action-value function of a state are reflected onto all similar states. This allows experience that is acquired during learning to be applied to a broader context. The effectiveness of the method is demonstrated empirically.

Subject Keywords

Control and Systems Engineering, Human-Computer Interaction, Electrical and Electronic Engineering, Software, Information Systems, General Medicine, Computer Science Applications

URI

https://hdl.handle.net/11511/46977

Journal

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS

DOI

https://doi.org/10.1109/tsmcb.2007.899419

Collections

Department of Computer Engineering, Article

Suggestions

OpenMETU
Core

Free gait generation with reinforcement learning for a six-legged robot Erden, Mustafa Suphi; Leblebicioğlu, Mehmet Kemal (Elsevier BV, 2008-03-31) In this paper the problem of free gait generation and adaptability with reinforcement learning are addressed for a six-legged robot. Using the developed free gait generation algorithm the robot maintains to generate stable gaits according to the commanded velocity. The reinforcement learning scheme incorporated into the free gait generation makes the robot choose more stable states and develop a continuous walking pattern with a larger average stability margin. While walking in normal conditions with no ext...
Experimental Validation of a Feed-Forward Predictor for the Spring-Loaded Inverted Pendulum Template Uyanik, Ismail; Morgul, Omer; Saranlı, Uluç (Institute of Electrical and Electronics Engineers (IEEE), 2015-02-01) Widely accepted utility of simple spring-mass models for running behaviors as descriptive tools, as well as literal control targets, motivates accurate analytical approximations to their dynamics. Despite the availability of a number of such analytical predictors in the literature, their validation has mostly been done in simulation, and it is yet unclear how well they perform when applied to physical platforms. In this paper, we extend on one of the most recent approximations in the literature to ensure it...
A pattern classification approach for boosting with genetic algorithms Yalabık, Ismet; Yarman Vural, Fatoş Tunay; Üçoluk, Göktürk; Şehitoğlu, Onur Tolga (2007-11-09) Ensemble learning is a multiple-classifier machine learning approach which produces collections and ensembles statistical classifiers to build up more accurate classifier than the individual classifiers. Bagging, boosting and voting methods are the basic examples of ensemble learning. In this study, a novel boosting technique targeting to solve partial problems of AdaBoost, a well-known boosting algorithm, is proposed. The proposed system finds an elegant way of boosting a bunch of classifiers successively ...
Collaborative and Cognitive Network Platforms: Vision and Research Challenges Onur, Ertan; Hawas, Mohamed Gamal; de Groot, Sonia Marcela Heemstra; Niemegeers, Ignas G. M. M. (Springer Science and Business Media LLC, 2011-05-01) In this paper, we present a visionary concept referred to as Collaborative and Cognitive Network Platforms (CCNPs) as a future-proof solution for creating a dependable, self-organizing and self-managing communication substrate for effective ICT solutions to societal problems. CCNP creates a cooperative communication platform to support critical services across a range of business sectors. CCNP is based on the personal network (PN) technology which is an inherently cooperative environment prototyped in the D...
Multiagent reinforcement learning using function approximation Abul, O; Polat, Faruk; Alhajj, R (Institute of Electrical and Electronics Engineers (IEEE), 2000-11-01) Learning in a partially observable and nonstationary environment is still one of the challenging problems In the area of multiagent (MA) learning. Reinforcement learning is a generic method that suits the needs of MA learning in many aspects. This paper presents two new multiagent based domain independent coordination mechanisms for reinforcement learning; multiple agents do not require explicit communication among themselves to learn coordinated behavior. The first coordination mechanism Is perceptual coor...

Citation Formats

S. Girgin, F. Polat, and R. Alhaj, “Positive impact of state similarity on reinforcement learning performance,” IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, pp. 1256–1270, 2007, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/46977.