A thorough analysis of unsupervised depth and ego-motion estimation

Sarı, Alp Eren
Recent years have shown unprecedented success in depth estimation by jointly solving unsupervised depth estimation and pose estimation. In this study, we perform a thorough analysis for such an approach. Initially, pose estimation performances of classical techniques, such as COLMAP, are compared against recent unsupervised learning-based techniques. Simulation results indicate the superiority of Bundle Adjustment step in classical techniques. Next, the effect of the number of input frames to the pose estimator network is investigated in detail. The experiments performed at this step revealed that the state-of-the-art can be improved by providing extra frames to the pose estimator network. Finally, the semantic labels of objects in the scene are utilized individually during pose and depth estimation stages. For this purpose, pre-trained semantic segmentation networks are utilized. The effect of computing losses from different regions of the scene and averaging different pose estimations with learnable weights are investigated. The poses and losses corresponding to different semantic classes are summed with learnable weights yielding comparable results against state-of-the-art methods.
Citation Formats
A. E. Sarı, “A thorough analysis of unsupervised depth and ego-motion estimation,” M.S. - Master of Science, 2020.