An analysis of stereo depth estimation utilizing attention mechanisms, self-supervised pose estimators & temporal predictions

Download

Tez_v21.pdf

Date

2022-5-18

Author

Oğuzman, Utku

Metadata

Show full item record

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Item Usage Stats

375
views

344
downloads

By the recent success of deep learning, real-world applications of stereo depth estimation algorithms attracted the interest of many researchers. Using the available datasets, synthetic or real-world, the researchers begin analyzing their ideas for practical applications. In this thesis, a thorough analysis is performed of such an aim. The state-of-the-art stereo depth estimation algorithms are tried to be improved by incorporating attention mechanisms to the current networks and better initialization strategies in time. For this purpose, different amounts of attention modules are applied to one of the most successful stereo depth estimator networks. The performance of the proposed attention-based neural networks that is trained with the synthetic stereo datasets under a supervised setting is compared against the performance of a baseline algorithm and it yielded superior results. When these neural networks are finetuned using a small annotated real-world dataset, the baseline algorithm had a better performance. Secondly, the temporal information available in the synthetic datasets is leveraged by teaching the proposed neural network how to initialize the current iteration by using the previous predictions. Finally, in order to finetune the neural network better for real-world use with the temporal information, a large unannotated real-world dataset is utilized under a self-supervised training setting using ego-pose estimation and optical flow networks. In general, it is observed that these settings yield better results against state-of-the-art methods in the synthetic-to-real world supervised training settings, and they are comparable after the finetuning operation.

Subject Keywords

Stereo depth estimation, Attention modules, Self-supervised learning, Finetuning

URI

https://hdl.handle.net/11511/97789

Collections

Graduate School of Natural and Applied Sciences, Thesis

Suggestions

OpenMETU
Core

Detection of clean samples in noisy labelled datasets via analysis of artificially corrupted samples Yıldırım, Botan; Ulusoy, İlkay; Department of Electrical and Electronics Engineering (2022-8-22) Recent advances in supervised deep learning methods have shown great successes in image classification but these methods are known to owe their success to massive amount of data with reliable labels. However, constructing large-scale datasets inevitably results with varying levels of label noise which degrades performance of the supervised deep learning based classifiers. In this thesis, we make an analysis of sample selection based label noise robust approaches by providing extensive experimental evaluatio...
An experimental comparison of symbolic and neural learning algorithms Baykal, Nazife (1998-04-23) In this paper comparative strengths and weaknesses of symbolic and neural learning algorithms are analysed. Experiments comparing the new generation symbolic algorithms and neural network algorithms have been performed using twelve large, real-world data sets.
AN ABSTRACTION BASED REDUCED REFERENCE DEPTH PERCEPTION METRIC FOR 3D VIDEO NUR YILMAZ, GÖKÇE; Akar, Gözde (2012-10-03) In order to speed up the wide-spread proliferation of the 3D video technologies (e.g., coding, transmission, display, etc), the effect of these technologies on 3D perception should be efficiently and reliably investigated. Using Full-Reference (FR) objective metrics for this investigation is not practical especially for "on the fly" 3D perception evaluation. Thus, a Reduced Reference (RR) metric is proposed to predict the depth perception of 3D video in this paper. The color-plus-depth 3D video representati...
A new framework of multi-objective evolutionary algorithms for feature selection and multi-label classification of video data Karagoz, Gizem Nur; Yazıcı, Adnan; Dokeroglu, Tansel; Coşar, Ahmet (2020-06-01) There are few studies in the literature to address the multi-objective multi-label feature selection for the classification of video data using evolutionary algorithms. Selecting the most appropriate subset of features is a significant problem while maintaining/improving the accuracy of the prediction results. This study proposes a framework of parallel multi-objective Non-dominated Sorting Genetic Algorithms (NSGA-II) for exploring a Pareto set of non-dominated solutions. The subsets of non-dominated featu...
On numerical optimization theory of infinite kernel learning Ozogur-Akyuz, S.; Weber, Gerhard Wilhelm (2010-10-01) In Machine Learning algorithms, one of the crucial issues is the representation of the data. As the given data source become heterogeneous and the data are large-scale, multiple kernel methods help to classify "nonlinear data". Nevertheless, the finite combinations of kernels are limited up to a finite choice. In order to overcome this discrepancy, a novel method of "infinite" kernel combinations is proposed with the help of infinite and semi-infinite programming regarding all elements in kernel space. Look...

Citation Formats

U. Oğuzman, “An analysis of stereo depth estimation utilizing attention mechanisms, self-supervised pose estimators & temporal predictions,” M.S. - Master of Science, Middle East Technical University, 2022.