Pruning algorithms for partially observable markov decision processes

Download
2017
Özgen, Selim
It is possible to represent the value function in partially observable Markov decision processes as a piecewise linear function if the state, action, and observation space is discrete. Exact value iteration algorithm searches for this value function by creating an exponential number of linear functions at each step, many of which can be pruned without changing the value of the value function. The pruning procedure is made possible by the use of linear programming. This study first gives a geometric framework of the pruning procedure. It shows that the linear programming iterations refer to the selection of different convex regions in the vector space representation of the pruning problem. We also put forward an algebraic framework, which is the utilization and maintenance of linear programs. It shows how the problem can be decomposed into small sized LPs and what the LP iterations refer to. While stating these two theoretical frameworks, their relations have also been exploited. The exponential increase in the number of vectors in any step of the exact value iteration algorithm is due to an operation called the cross-sum addition of a set of vectors. This operation results in a new set of vectors. It is known that for any of the summand vectors in this new set to be non-dominated, the addend vectors entering the cross-sum addition should have intersecting support sets. The given geometric and algebraic framework has further been extended to exploit this particular property of the cross-sum operation. Two novel pruning algorithms have been offered in this study. First algorithm, called FastCone, can be used for pruning any given set of vectors. For a given set of clean vectors at any step, the algorithm hastily searches for the convex region that a dirty vector is in and tries to find a clean vector if only the given set of clean vectors is not sufficient to make the decision about this dirty vector. The second algorithm is called Cross-Sum Pruning with Multiple Objective Functions, where the aim is to find the vectors that have non-intersecting support sets with the current active vectors in each simplex iteration. This approach is useful because when two vectors from two different sets with non-intersecting support sets are detected, it is possible to delete all ordered pairs containing these two vectors. And this amounts to a simple sign check of the coefficients of a row of the simplex tableau. To show the algorithms' performance, both algorithms have been compared to the conventional algorithms and their revised versions both analytically and experimentally.

Suggestions

Implementable policies: Discounted cost case
Serin, Yaşar Yasemin (1995-01-18)
We consider a Markov decision process (MDP) with finite state space S and finite action set A. The state space is partitioned into K sets S-1, S-2, ..., S-K. A stationary randomized policy is described by the parameters {alpha(ia), i is an element of S, a is an element of A}, where alpha(ia) = the parobability that action a is taken when the system is in state i. A policy is called implementable if alpha(ia) = alpha(ja) for all a is an element of A whenever states i and j belong to a common subset S-r for s...
Almost periodic solutions of the linear differential equation with piecewise constant argument
Akhmet, Marat (2009-10-01)
The paper is concerned with the existence and stability of almost periodic solutions of linear systems with piecewise constant argument where t∈R, x ∈ Rn [·] is the greatest integer function. The Wexler inequality [1]-[4] for the Cauchy's matrix is used. The results can be easily extended for the quasilinear case. A new technique of investigation of equations with piecewise argument, based on an integral representation formula, is proposed. Copyright © 2009 Watam Press.
Finite bisimulations for switched linear systems
Aydın Göl, Ebru; Lazar, Mircea; Belta, Calin (2013-02-04)
In this paper, we consider the problem of constructing a finite bisimulation quotient for a discrete-time switched linear system in a bounded subset of its state space. Given a set of observations over polytopic subsets of the state space and a switched linear system with stable subsystems, the proposed algorithm generates the bisimulation quotient in a finite number of steps with the aid of sublevel sets of a polyhedral Lyapunov function. Starting from a sublevel set that includes the origin in its interio...
Finite Bisimulations for Switched Linear Systems
Aydın Göl, Ebru; Lazar, Mircea; Belta, Calin (2014-12-01)
In this paper, we consider the problem of constructing a finite bisimulation quotient for a discrete-time switched linear system in a bounded subset of its state space. Given a set of observations over polytopic subsets of the state space and a switched linear system with stable subsystems, the proposed algorithm generates the bisimulation quotient in a finite number of steps with the aid of sublevel sets of a polyhedral Lyapunov function. Starting from a sublevel set that includes the origin in its interio...
Time Constrained Temporal Logic Control of Multi Affine Systems
Aydın Göl, Ebru (2012-01-01)
In this paper, we consider the problem of controlling a dynamical system such that its trajectories satisfy a temporal logic property in a given amount of time. We focus on multiaffine systems and specifications given as syntactically co-safe linear temporal logic formulas over rectangular regions in the state space. The proposed algorithm is based on the estimation of time bounds for facet reachability problems and solving a time optimal reachability problem on the product between a weighted transition sys...
Citation Formats
S. Özgen, “Pruning algorithms for partially observable markov decision processes,” Ph.D. - Doctoral Program, Middle East Technical University, 2017.