A context aware model for autonomous agent stochastic planning

Markov Decision Processes (MDPs) are not able to make use of domain information effectively due to their representational limitations. The lacking of elements which enable the models be aware of context, leads to unstructured representation of that problem such as raw probability matrices or lists. This causes these tools significantly less efficient at determining a useful policy as the state space of a task grows, which is the case for more realistic problems having localized dependencies between states and actions. In this paper, we present a new state machine, called Context-Aware Markov Decision Process (CA-MDP) based on MDP for the purpose of representing Markovian sequential decision making problems in a more structured manner. CA-MDP changes and augments MDP facilities by integrating causal relationships between actions and states thereby enabling structural, hence compact if possible, representation of the tasks. To show the expressive power of CA-MDP, we give the theoretical bounds for complexity of conversion between MDP and CA-MDP to demonstrate the expressive power of CA-MDP. Next, to generate an optimal policy from CA-MDP encoding by exploiting those newly defined facilities, we devised a new solver algorithm based on value iteration (VI), called Context-Aware Value Iteration (CA-VI). Although regular dynamic programming (DP) based algorithms is successful at effectively determining optimal policies, they do not scale well with respect to state-action space, making both the MDP encoding and related solver mechanism practically unusable for real-life problems. Our solver algorithm gets the power of overcoming the scalability problem by integrating the structural information provided in CA-MDP. First, we give theoretical analysis of CA-VI by examining the expected number of Bellman updates being performed on arbitrary tasks. Finally, we present our conducted experiments on numerous problems, with important remarks and discussions on certain aspects of CA-VI and CA-MDP, to justify our theoretical analyses empirically and to assess the real performance of CA-VI with CA-MDP formulation by analysing the execution time by checking how close it gets to the practical minimum runtime bound with respect to VI performance with MDP encoding of the same task.