Utilizing video colorization as a self-supervised auxiliary task for object tracking

Download
2021-2-08
Fırat, Engin
In this thesis work, we studied combining an object tracker, which uses siamese networks, with another model that is trained by using the self-supervised learning paradigm. We define grayscale video colorization as a pretext task for self-supervised learning and we select the similarity based object tracking as a downstream task. Both the siamese network based object tracker and the colorization network model use the similarity between subsequent video frames. The spatio-temporal coherence between the frames of a video enables the network to learn this similarity. We study different ways of combining the two networks. Since colorization framework uses similarity learning as its basis, we cross correlate output features of colorization network as in siamese network based tracker. Then, we combine two different methods by taking the weighted average of their score maps in order to obtain a combined score map. We search for the optimal value of this weight by conducting several experiments. In addition, we conducted experiments with different neural network architectures for the colorization framework. Our experimental results show that utilizing the self-supervised pretext task improves the overall success rate when the combined network is further trained in a supervised manner. In addition, we also show that self-supervised video colorization network offers an alternative way for using modern and deeper networks in siamese architectures by alleviating the strict translational invariance restriction needed by siamese architectures.

Suggestions

Training object detectors by directly optimizing lrp metric
Çam, Barış Can; Akbaş, Emre; Kalkan, Sinan; Department of Computer Engineering (2020-9)
This thesis focuses on training deep object detection networks by directly optimizing the localisation-recall-precision (LRP) performance metric that can evaluate classification and localisation performance of an object detector in a unified manner (Oksuz et al., 2018). To achieve this goal, unlike the commonly used linear weighting approach, we aim to implicitly optimize the LRP metric first by using a bounded localisation loss from previous works and proposing a loss function that can bound the range ...
Transformation of conceptual models to executable High Level Architecture federation models
Özhan, Gürkan; Oğuztüzün, Mehmet Halit S. (Springer, 2015-01-01)
In this chapter, we present a formal, declarative, and visual model transformation methodology to map a domain conceptual model (CM) to a distributed simulation architecture model (DSAM). The approach adheres to the principles of model-driven engineering (MDE). A two-phased automatic transformation strategy is delineated to translate a field artillery conceptual model (ACM) into a high-level architecture (HLA) federation architecture model (FAM). The produced model is then compiled by the code generator to ...
Visual Object Tracking with Autoencoder Representations
Besbinar, Beril; Alatan, Abdullah Aydın (2016-05-19)
Deep learning is the discipline of training computational models that are composed of multiple layers and these methods have recently improved the state of the art in many areas as a virtue of large labeled datasets, increase in the computational power of current hardware and unsupervised training methods. Although such a dataset may not be available for lots of application areas, the representations obtained by the well-designed networks that have a large representation capacity and trained with enough dat...
Examining an Online Collaboration Learning Environment with the Dual Eye-Tracking Paradigm: The Case of Virtual Math Teams
Uzunosmanoglu, Selin Deniz; Çakır, Murat Perit (2014-06-27)
The aim of this study is to investigate the computer supported collaborative problem solving processes using the dual eye-tracking method. 18 university students participated in this study, and 9 pairs tried to solve 10 geometry problems using Virtual Math Team (VMT) online environment. Which situations the participants' eye movements, and eye gazes overlap, and how usability of VMT environment affect the problem solving processes are tried to identify. After experiments with two eye-trackers, a questionnai...
Extended Target Tracking Using Polynomials With Applications to Road-Map Estimation
Lundquist, Christian; Orguner, Umut; Gustafsson, Fredrik (Institute of Electrical and Electronics Engineers (IEEE), 2011-01-01)
This paper presents an extended target tracking framework which uses polynomials in order to model extended objects in the scene of interest from imagery sensor data. State-space models are proposed for the extended objects which enables the use of Kalman filters in tracking. Different methodologies of designing measurement equations are investigated. A general target tracking algorithm that utilizes a specific data association method for the extended targets is presented. The overall algorithm must always ...
Citation Formats
E. Fırat, “Utilizing video colorization as a self-supervised auxiliary task for object tracking,” M.S. - Master of Science, Middle East Technical University, 2021.