A Heuristic temporal difference approach with adaptive grid discretization

Download

index.pdf

Date

2016

Author

Fikir, Ozan Bora

Metadata

Show full item record

Item Usage Stats

234
views

88
downloads

Reinforcement learning (RL), as an area of machine learning, tackle with the problem defined in an environment where an autonomous agent ought to take actions to achieve an ultimate goal. In RL problems, the environment is typically formulated as a Markov decision process. However, in real life problems, the environment is not flawless to be formulated as an MDP, and we need to relax fully observability assumption of MDP. The resulting model is partially observable Markov decision process, which is a more realistic model but forms a difficult problem setting. In this model agent cannot directly access to true state of the environment, but to the observations which provides a partial information about the true state of environment. There are two common ways to solve POMDP problems; first one is to neglect the true state of the environment and directly rely on the observations. The second one is to define a belief state which is probability distribution over the actual states. However, since the belief state definition is based on probability distribution, the agent has to handle with continuous space unlike MDP case, which may become intractable easily in autonomous agent perspective. In this thesis, we focus on belief space solutions and attempt to reduce the complexity of belief space by partitioning continuous belief space into well-defined and regular regions with two different types of grid discretization as an abstraction over belief space. Then we define an approximate.

Subject Keywords

Machine learning., Artificial intelligence., Reinforcement learning.

URI

http://etd.lib.metu.edu.tr/upload/12620273/index.pdf
https://hdl.handle.net/11511/25874

Collections

Graduate School of Natural and Applied Sciences, Thesis

Suggestions

OpenMETU
Core

Effective subgoal discovery and option generation in reinforcement learning Demir, Alper; Polat, Faruk; Department of Computer Engineering (2016) Subgoal discovery is proven to be a practical way to cope with large state spaces in Reinforcement Learning. Subgoals are natural hints to partition the problem into sub-problems, allowing the agent to solve each sub-problem separately. Identification of such subgoal states in the early phases of the learning process increases the learning speed of the agent. In a problem modeled as a Markov Decision Process, subgoal states possess key features that distinguish them from the ordinary ones. A learning agent ...
Factored reinforcement learning using extended sequence trees Şahin, Coşkun; Polat, Faruk; Department of Computer Engineering (2015) Reinforcement Learning (RL) is an area concerned with learning how to act in an environment to reach a final state while gaining maximum amount of reward. Markov Decision Process (MDP) is the formal framework to define an RL task. In addition to different techniques proposed to solve MDPs, there are several studies to improve RL algorithms. Because these methods are often inadequate for real-world problems. Classical approaches require enumeration of all possible states to find a solution. But when states a...
AN EFFICIENT DATABASE TRANSITIVE CLOSURE ALGORITHM Toroslu, İsmail Hakkı; HENSCHEN, L (Springer Science and Business Media LLC, 1994-05-01) The integration of logic rules and relational databases has recently emerged as an important technique for developing knowledge management systems. An important class of logic rules utilized by these systems is the so-called transitive closure rules, the processing of which requires the computation of the transitive closure of database relations referenced by these rules. This article presents a new algorithm suitable for computing the transitive closure of very large database relations. This algorithm proc...
A Multinomial prototype-based learning algorithm Bulut, Ahmet Can; Kalkan, Sinan; Department of Computer Engineering (2014) Recent studies in machine learning field proved that ideas which were once thought impractical are in fact tangible. Over the years, researchers have managed to develop learning systems which are able to interact with the environment and use experiences for adaptation to new conditions. Humanoid robots can now learn concepts such as nouns, adjectives and verbs, which is a big step for building human-like learners. Behind all these achievements, development of successful learning and classification technique...
Automatic identification of transitional bottlenecks in reinforcement learning under partial observability Aydın, Hüseyin; Polat, Faruk; Department of Computer Engineering (2017) Instance-based methods are proven tools to solve reinforcement learning problems with hidden states. Nearest Sequence Memory (NSM) is a widely known instance-based approach mainly based on k-Nearest Neighbor algorithm. NSM keeps track of raw history of action-observation-reward instances within a fixed length (or ideally unlimited) memory. It calculates the neighborhood for the current state through a recursive comparison of the matching action-observation-reward tuples with the previous ones. The ones with...

Citation Formats

O. B. Fikir, “A Heuristic temporal difference approach with adaptive grid discretization,” M.S. - Master of Science, Middle East Technical University, 2016.