Pruning algorithms for partially observable markov decision processes

Download
2017
Özgen, Selim
It is possible to represent the value function in partially observable Markov decision processes as a piecewise linear function if the state, action, and observation space is discrete. Exact value iteration algorithm searches for this value function by creating an exponential number of linear functions at each step, many of which can be pruned without changing the value of the value function. The pruning procedure is made possible by the use of linear programming. This study first gives a geometric framework of the pruning procedure. It shows that the linear programming iterations refer to the selection of different convex regions in the vector space representation of the pruning problem. We also put forward an algebraic framework, which is the utilization and maintenance of linear programs. It shows how the problem can be decomposed into small sized LPs and what the LP iterations refer to. While stating these two theoretical frameworks, their relations have also been exploited. The exponential increase in the number of vectors in any step of the exact value iteration algorithm is due to an operation called the cross-sum addition of a set of vectors. This operation results in a new set of vectors. It is known that for any of the summand vectors in this new set to be non-dominated, the addend vectors entering the cross-sum addition should have intersecting support sets. The given geometric and algebraic framework has further been extended to exploit this particular property of the cross-sum operation. Two novel pruning algorithms have been offered in this study. First algorithm, called FastCone, can be used for pruning any given set of vectors. For a given set of clean vectors at any step, the algorithm hastily searches for the convex region that a dirty vector is in and tries to find a clean vector if only the given set of clean vectors is not sufficient to make the decision about this dirty vector. The second algorithm is called Cross-Sum Pruning with Multiple Objective Functions, where the aim is to find the vectors that have non-intersecting support sets with the current active vectors in each simplex iteration. This approach is useful because when two vectors from two different sets with non-intersecting support sets are detected, it is possible to delete all ordered pairs containing these two vectors. And this amounts to a simple sign check of the coefficients of a row of the simplex tableau. To show the algorithms' performance, both algorithms have been compared to the conventional algorithms and their revised versions both analytically and experimentally.

Suggestions

Finite bisimulations for switched linear systems
Aydın Göl, Ebru; Lazar, Mircea; Belta, Calin (2013-02-04)
In this paper, we consider the problem of constructing a finite bisimulation quotient for a discrete-time switched linear system in a bounded subset of its state space. Given a set of observations over polytopic subsets of the state space and a switched linear system with stable subsystems, the proposed algorithm generates the bisimulation quotient in a finite number of steps with the aid of sublevel sets of a polyhedral Lyapunov function. Starting from a sublevel set that includes the origin in its interio...
Finite Bisimulations for Switched Linear Systems
Aydın Göl, Ebru; Lazar, Mircea; Belta, Calin (2014-12-01)
In this paper, we consider the problem of constructing a finite bisimulation quotient for a discrete-time switched linear system in a bounded subset of its state space. Given a set of observations over polytopic subsets of the state space and a switched linear system with stable subsystems, the proposed algorithm generates the bisimulation quotient in a finite number of steps with the aid of sublevel sets of a polyhedral Lyapunov function. Starting from a sublevel set that includes the origin in its interio...
Time Constrained Temporal Logic Control of Multi Affine Systems
Aydın Göl, Ebru (2012-01-01)
In this paper, we consider the problem of controlling a dynamical system such that its trajectories satisfy a temporal logic property in a given amount of time. We focus on multiaffine systems and specifications given as syntactically co-safe linear temporal logic formulas over rectangular regions in the state space. The proposed algorithm is based on the estimation of time bounds for facet reachability problems and solving a time optimal reachability problem on the product between a weighted transition sys...
Bounded operators and complemented subspaces of Cartesian products
DJAKOV, PLAMEN; TERZİOĞLU, AHMET TOSUN; Yurdakul, Murat Hayrettin; Zahariuta, V. (2011-02-01)
We study the structure of complemented subspaces in Cartesian products X x Y of Kothe spaces X and Y under the assumption that every linear continuous operator from X to Y is bounded. In particular, it is proved that each non-Montel complemented subspace with absolute basis E subset of X x Y is isomorphic to a space of the form E(1) x E(2), where E(1) is a complemented subspace of X and E(2) is a complemented subspace of Y. (C) 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
Efficient Surface Integral Equation Methods for the Analysis of Complex Metamaterial Structures
Yla-Oijala, Pasi; Ergül, Özgür Salih; Gurel, Levent; Taskinen, Matti (2009-03-27)
Two approaches, the multilevel fast multipole algorithm with sparse approximate inverse preconditioner and the surface equivalence principle algorithm, are applied to analyze complex three-dimensional metamaterial structures. The efficiency and performance of these methods are studied and discussed.
Citation Formats
S. Özgen, “Pruning algorithms for partially observable markov decision processes,” Ph.D. - Doctoral Program, Middle East Technical University, 2017.