A Heuristic temporal difference approach with adaptive grid discretization

Fikir, Ozan Bora
Reinforcement learning (RL), as an area of machine learning, tackle with the problem defined in an environment where an autonomous agent ought to take actions to achieve an ultimate goal. In RL problems, the environment is typically formulated as a Markov decision process. However, in real life problems, the environment is not flawless to be formulated as an MDP, and we need to relax fully observability assumption of MDP. The resulting model is partially observable Markov decision process, which is a more realistic model but forms a difficult problem setting. In this model agent cannot directly access to true state of the environment, but to the observations which provides a partial information about the true state of environment. There are two common ways to solve POMDP problems; first one is to neglect the true state of the environment and directly rely on the observations. The second one is to define a belief state which is probability distribution over the actual states. However, since the belief state definition is based on probability distribution, the agent has to handle with continuous space unlike MDP case, which may become intractable easily in autonomous agent perspective. In this thesis, we focus on belief space solutions and attempt to reduce the complexity of belief space by partitioning continuous belief space into well-defined and regular regions with two different types of grid discretization as an abstraction over belief space. Then we define an approximate.


Effective subgoal discovery and option generation in reinforcement learning
Demir, Alper; Polat, Faruk; Department of Computer Engineering (2016)
Subgoal discovery is proven to be a practical way to cope with large state spaces in Reinforcement Learning. Subgoals are natural hints to partition the problem into sub-problems, allowing the agent to solve each sub-problem separately. Identification of such subgoal states in the early phases of the learning process increases the learning speed of the agent. In a problem modeled as a Markov Decision Process, subgoal states possess key features that distinguish them from the ordinary ones. A learning agent ...
A Concept Filtering Approach for Diverse Density to Discover Subgoals in Reinforcement Learning
DEMİR, ALPER; Cilden, Erkin; Polat, Faruk (2017-11-08)
In the reinforcement learning context, subgoal discovery methods aim to find bottlenecks in problem state space so that the problem can naturally be decomposed into smaller subproblems. In this paper, we propose a concept filtering method that extends an existing subgoal discovery method, namely diverse density, to be used for both fully and partially observable RL problems. The proposed method is successful in discovering useful subgoals with the help of multiple instance learning. Compared to the original...
Factored reinforcement learning using extended sequence trees
Şahin, Coşkun; Polat, Faruk; Department of Computer Engineering (2015)
Reinforcement Learning (RL) is an area concerned with learning how to act in an environment to reach a final state while gaining maximum amount of reward. Markov Decision Process (MDP) is the formal framework to define an RL task. In addition to different techniques proposed to solve MDPs, there are several studies to improve RL algorithms. Because these methods are often inadequate for real-world problems. Classical approaches require enumeration of all possible states to find a solution. But when states a...
Simple and complex behavior learning using behavior hidden Markov Model and CobART
Seyhan, Seyit Sabri; Alpaslan, Ferda Nur; Department of Computer Engineering (2013)
In this thesis, behavior learning and generation models are proposed for simple and complex behaviors of robots using unsupervised learning methods. Simple behaviors are modeled by simple-behavior learning model (SBLM) and complex behaviors are modeled by complex-behavior learning model (CBLM) which uses previously learned simple or complex behaviors. Both models have common phases named behavior categorization, behavior modeling, and behavior generation. Sensory data are categorized using correlation based...
A Multinomial prototype-based learning algorithm
Bulut, Ahmet Can; Kalkan, Sinan; Department of Computer Engineering (2014)
Recent studies in machine learning field proved that ideas which were once thought impractical are in fact tangible. Over the years, researchers have managed to develop learning systems which are able to interact with the environment and use experiences for adaptation to new conditions. Humanoid robots can now learn concepts such as nouns, adjectives and verbs, which is a big step for building human-like learners. Behind all these achievements, development of successful learning and classification technique...
Citation Formats
O. B. Fikir, “A Heuristic temporal difference approach with adaptive grid discretization,” M.S. - Master of Science, Middle East Technical University, 2016.