Bipedal Robot Walking by Reinforcement Learning in Partially Observed Environment

2021-8-27
Özalp, Uğurcan
Deep Reinforcement Learning methods on mechanical control have been successfully applied in many environments and used instead of traditional optimal and adaptive control methods for some complex problems. However, Deep Reinforcement Learning algorithms do still have some challenges. One is to control on partially observable environments. When an agent is not informed well of the environment, it must recover information from the past observations. In this thesis, walking of Bipedal Walker Hardcore (OpenAI GYM) environment, which is partially observable, is studied by two continuous actor-critic reinforcement learning algorithms; Twin Delayed Deep Determinstic Policy Gradient and Soft Actor-Critic. Several neural architectures are implemented. The first one is Residual Feed Forward Neural Network under the observable environment assumption, while the second and the third ones are Long Short Term Memory and Transformer using observation history as input to recover the hidden information due to the partially observable environment.

Suggestions

Improving reinforcement learning using distinctive clues of the environment
Demir, Alper; Polat, Faruk; Department of Computer Engineering (2019)
Effective decomposition and abstraction has been shown to improve the performance of Reinforcement Learning. An agent can use the clues from the environment to either partition the problem into sub-problems or get informed about its progress in a given task. In a fully observable environment such clues may come from subgoals while in a partially observable environment they may be provided by unique experiences. The contribution of this thesis is two fold; first improvements over automatic subgoal identifica...
Mobile Robot Heading Adjustment Using Radial Basis Function Neural Networks Controller and Reinforcement Learning
BAYAR, GÖKHAN; Konukseven, Erhan İlhan; Koku, Ahmet Buğra (2008-10-28)
This paper proposes radial basis function neural networks approach to the Solution of a mobile robot heading adjustment using reinforcement learning. In order to control the heading of the mobile robot, the neural networks control system have been constructed and implemented. Neural controller has been charged to enhance the control system by adding some degrees of strength. It has been achieved that neural networks system can learn the relationship between the desired directional heading and the error posi...
Effective subgoal discovery and option generation in reinforcement learning
Demir, Alper; Polat, Faruk; Department of Computer Engineering (2016)
Subgoal discovery is proven to be a practical way to cope with large state spaces in Reinforcement Learning. Subgoals are natural hints to partition the problem into sub-problems, allowing the agent to solve each sub-problem separately. Identification of such subgoal states in the early phases of the learning process increases the learning speed of the agent. In a problem modeled as a Markov Decision Process, subgoal states possess key features that distinguish them from the ordinary ones. A learning agent ...
Visual Object Tracking with Autoencoder Representations
Besbinar, Beril; Alatan, Abdullah Aydın (2016-05-19)
Deep learning is the discipline of training computational models that are composed of multiple layers and these methods have recently improved the state of the art in many areas as a virtue of large labeled datasets, increase in the computational power of current hardware and unsupervised training methods. Although such a dataset may not be available for lots of application areas, the representations obtained by the well-designed networks that have a large representation capacity and trained with enough dat...
Using Generative Adversarial Nets on Atari Games for Feature Extraction in Deep Reinforcement Learning
Aydın, Ayberk; Sürer, Elif (2020-04-01)
Deep Reinforcement Learning (DRL) has been suc-cessfully applied in several research domains such as robotnavigation and automated video game playing. However, thesemethods require excessive computation and interaction with theenvironment, so enhancements on sample efficiency are required.The main reason for this requirement is that sparse and delayedrewards do not provide an effective supervision for representationlearning of deep neural networks. In this study, Proximal Policy...
Citation Formats
U. Özalp, “Bipedal Robot Walking by Reinforcement Learning in Partially Observed Environment,” M.S. - Master of Science, Middle East Technical University, 2021.