Learning by Automatic Option Discovery from Conditionally Terminating Sequences

2006-08-28
Girgin, Sertan
Polat, Faruk
Alhajj, Reda
This paper proposes a novel approach to discover options in the form of conditionally terminating sequences, and shows how they can be integrated into reinforcement learning framework to improve the learning performance. The method utilizes stored histories of possible optimal policies and constructs a specialized tree structure online in order to identify action sequences which are used frequently together with states that are visited during the execution of such sequences. The tree is then used to implicitly run corresponding options. Effectiveness of the method is demonstrated empirically.

Suggestions

Positive impact of state similarity on reinforcement learning performance
Girgin, Sertan; Polat, Faruk; Alhaj, Reda (Institute of Electrical and Electronics Engineers (IEEE), 2007-10-01)
In this paper, we propose a novel approach to identify states with similar subpolicies and show how they can be integrated into the reinforcement learning framework to improve learning performance. The method utilizes a specialized tree structure to identify common action sequences of states, which are derived from possible optimal policies, and defines a similarity function between two states based on the number of such sequences. Using this similarity function, updates on the action-value function of a st...
Learning semi-supervised nonlinear embeddings for domain-adaptive pattern recognition
Vural, Elif (null; 2019-05-20)
We study the problem of learning nonlinear data embeddings in order to obtain representations for efficient and domain-invariant recognition of visual patterns. Given observations of a training set of patterns from different classes in two different domains, we propose a method to learn a nonlinear mapping of the data samples from different domains into a common domain. The nonlinear mapping is learnt such that the class means of different domains are mapped to nearby points in the common domain in order to...
Control of a differentially driven mobile robot using radial basis function based neural networks
Bayar, Gökhan; Konukseven, Erhan İlhan; Buǧra Koku, A. (2008-12-01)
This paper proposes the use of radial basis function neural networks approach to the solution of a mobile robot orientation adjustment using reinforcement learning. In order to control the orientation of the mobile robot, a neural network control system has been constructed and implemented. Neural controller has been charged to enhance the control system by adding some degrees of award. Making use of the potential of neural networks to learn the relationships, the desired reference orientation and the error...
Recursive Compositional Reinforcement Learning for Continuous Control Sürekli Kontrol Uygulamalari için Özyinelemeli Bileşimsel Pekiştirmeli Öǧrenme
Tanik, Guven Orkun; Ertekin Bolelli, Şeyda (2022-01-01)
© 2022 IEEE.Compositional and temporal abstraction is the key to improving learning and planning in reinforcement learning. Modern real-world control problems call for continuous control domains and robust, sample efficient and explainable control frameworks. We are presenting a framework for recursively composing control skills to solve compositional and progressively complex tasks. The framework promotes reuse of skills, and as a result quickly adaptable to new tasks. The decision-tree can be observed, pr...
Simple and complex behavior learning using behavior hidden Markov Model and CobART
Seyhan, Seyit Sabri; Alpaslan, Ferda Nur; Department of Computer Engineering (2013)
In this thesis, behavior learning and generation models are proposed for simple and complex behaviors of robots using unsupervised learning methods. Simple behaviors are modeled by simple-behavior learning model (SBLM) and complex behaviors are modeled by complex-behavior learning model (CBLM) which uses previously learned simple or complex behaviors. Both models have common phases named behavior categorization, behavior modeling, and behavior generation. Sensory data are categorized using correlation based...
Citation Formats
S. Girgin, F. Polat, and R. Alhajj, “Learning by Automatic Option Discovery from Conditionally Terminating Sequences,” 2006, vol. 141, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/53539.