Automatic landmark discovery for learning agents under partial observability

Date

2019-08-02

Author

DEMİR, ALPER
Cilden, Erkin
Polat, Faruk

Metadata

Show full item record

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Item Usage Stats

172
views

0
downloads

In the reinforcement learning context, a landmark is a compact information which uniquely couples a state, for problems with hidden states. Landmarks are shown to support finding good memoryless policies for Partially Observable Markov Decision Processes (POMDP) which contain at least one landmark. SarsaLandmark, as an adaptation of Sarsa(lambda), is known to promise a better learning performance with the assumption that all landmarks of the problem are known in advance.

Subject Keywords

Software, Artificial Intelligence

URI

https://hdl.handle.net/11511/39370

Journal

KNOWLEDGE ENGINEERING REVIEW

DOI

https://doi.org/10.1017/s026988891900002x

Collections

Department of Computer Engineering, Article

Suggestions

OpenMETU
Core

A matching algorithm based on linear features Atalay, Mehmet Volkan (Elsevier BV, 1998-07-01) A two step feature matching algorithm which is primarily aimed at problems related to the analysis of aerial images of man-made sites is presented. Only linear features and their geometric attributes are used in the algorithm. First, the rotation between the two images is calculated and then matching by relaxation is performed assuming that there is only translation.
A complete axiomatization for fuzzy functional and multivalued dependencies in fuzzy database relations Sozat, MI; Yazıcı, Adnan (Elsevier BV, 2001-01-15) This paper first introduces the formal definitions of fuzzy functional and multivalued dependencies which are given on the basis of the conformance values presented here. Second, the inference rules are listed after both fuzzy functional and multivalued dependencies are shown to be consistent, that is, they reduce to those of the classic functional and multivalued dependencies when crisp attributes are involved. Finally, the inference rules presented here are shown to be sound and complete for the family of...
Improving reinforcement learning by using sequence trees Girgin, Sertan; Polat, Faruk; Alhajj, Reda (Springer Science and Business Media LLC, 2010-12-01) This paper proposes a novel approach to discover options in the form of stochastic conditionally terminating sequences; it shows how such sequences can be integrated into the reinforcement learning framework to improve the learning performance. The method utilizes stored histories of possible optimal policies and constructs a specialized tree structure during the learning process. The constructed tree facilitates the process of identifying frequently used action sequences together with states that are visit...
A Hopfield neural network with multi-compartmental activation Akhmet, Marat (Springer Science and Business Media LLC, 2018-05-01) The Hopfield network is a form of recurrent artificial neural network. To satisfy demands of artificial neural networks and brain activity, the networks are needed to be modified in different ways. Accordingly, it is the first time that, in our paper, a Hopfield neural network with piecewise constant argument of generalized type and constant delay is considered. To insert both types of the arguments, a multi-compartmental activation function is utilized. For the analysis of the problem, we have applied the ...
Learning Context on a Humanoid Robot using Incremental Latent Dirichlet Allocation Çelikkanat, Hande; Orhan, Guner; Pugeault, Nicolas; Guerin, Frank; Şahin, Erol; Kalkan, Sinan (Institute of Electrical and Electronics Engineers (IEEE), 2016-03-01) In this paper, we formalize and model context in terms of a set of concepts grounded in the sensorimotor interactions of a robot. The concepts are modeled as a web using Markov Random Field (MRF), inspired from the concept web hypothesis for representing concepts in humans. On this concept web, we treat context as a latent variable of Latent Dirichlet Allocation (LDA), which is a widely-used method in computational linguistics for modeling topics in texts. We extend the standard LDA method in order to make ...

Citation Formats

A. DEMİR, E. Cilden, and F. Polat, “Automatic landmark discovery for learning agents under partial observability,” KNOWLEDGE ENGINEERING REVIEW, pp. 0–0, 2019, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/39370.