Good features to correlate for visual tracking

Download

index.pdf

Date

2017

Author

Gündoğdu, Erhan

Metadata

Show full item record

Item Usage Stats

318
views

166
downloads

Estimating object motion is one of the key components of video processing and the first step in applications which require video representation. Visual object tracking is one way of extracting this component, and it is one of the major problems in the field of computer vision. Numerous discriminative and generative machine learning approaches have been employed to solve this problem. Recently, correlation filter based (CFB) approaches have been popular due to their computational efficiency and notable performances on benchmark datasets. The ultimate goal of CFB approaches is to find a filter (emph{i.e.}, template) which can produce high correlation outputs around the actual object location and low correlation outputs around the locations that are far from the object. Nevertheless, CFB visual tracking methods suffer from many challenges, such as occlusion, abrupt appearance changes, fast motion and object deformation. The main reasons of these sufferings are forgetting the past poses of the objects due to the simple update stages of CFB methods, non-optimal model update rate and features that are not invariant to appearance changes of the target object. In order to address the aforementioned disadvantages of CFB visual tracking methods, this thesis includes three major contributions. First, a spatial window learning method is proposed to improve the correlation quality. For this purpose, a window that is to be element-wise multiplied by the object observation (or the correlation filter) is learned by a novel gradient descent procedure. The learned window is capable of suppressing/highlighting the necessary regions of the object, and can improve the tracking performance in the case of occlusions and object deformation. As the second contribution, an ensemble of trackers algorithm is proposed to handle the issues of non-optimal learning rate and forgetting the past poses of the object. The trackers in the ensemble are organized in a binary tree, which stores individual expert trackers at its nodes. During the course of tracking, the relevant expert trackers to the most recent object appearance are activated and utilized in the localization and update stages. The proposed ensemble method significantly improves the tracking accuracy, especially when the expert trackers are selected as the CFB trackers utilizing the proposed window learning method. The final contribution of the thesis addresses the feature learning problem specifically focused on the CFB visual tracking loss function. For this loss function, a novel backpropagation algorithm is developed to train any fully deep convolutional neural network. The proposed gradient calculation, which is required for backpropagation, is performed efficiently in both frequency and image domain, and has a linear complexity with the number of feature maps. The training of the network model is fulfilled on carefully curated datasets including well-known difficulties of visual tracking, emph{e.g.}, occlusion, object deformation and fast motion. When the learned features are integrated to the state-of-the-art CFB visual trackers, favorable tracking performance is obtained on benchmark datasets against the CFB methods that employ hand-crafted features or deep features extracted from the pre-trained classification models.

Subject Keywords

Robot vision., Computer vision., Image processing., Tracking (Engineering).

URI

http://etd.lib.metu.edu.tr/upload/12621448/index.pdf
https://hdl.handle.net/11511/26941

Collections

Graduate School of Natural and Applied Sciences, Thesis

Suggestions

OpenMETU
Core

Hierarchical representations for visual object tracking by detection Beşbınar, Beril; Alatan, Abdullah Aydın; Department of Electrical and Electronics Engineering (2015) Deep learning is the discipline of training computational models that are composed of multiple layers and these methods have improved the state of the art in many areas such as visual object detection, scene understanding or speech recognition. Rebirth of these fairly old computational models is usually related to the availability of large datasets, increase in the computational power of current hardware and more recently proposed unsupervised training methods that exploit the internal structure of very lar...
A comparative study on pose estimation algorithms using visual data Çetinkaya, Güven; Alatan, Abdullah Aydın; Department of Electrical and Electronics Engineering (2012) Computation of the position and orientation of an object with respect to a camera from its images is called pose estimation problem. Pose estimation is one of the major problems in computer vision, robotics and photogrammetry. Object tracking, object recognition, self-localization of robots are typical examples for the use of pose estimation. Determining the pose of an object from its projections requires 3D model of an object in its own reference system, the camera parameters and 2D image of the object. Mo...
Object recognition and segmentation via shape models Altınoklu, Metin Burak; Ulusoy, İlkay; Tarı, Zehra Sibel; Department of Electrical and Electronics Engineering (2016) In this thesis, the problem of object detection, recognition and segmentation in computer vision is addressed with shape based methods. An efficient object detection method based on a sparse skeleton has been proposed. The proposed method is an improved chamfer template matching method for recognition of articulated objects. Using a probabilistic graphical model structure, shape variation is represented in a skeletal shape model, where nodes correspond to parts consisting of lines and edges correspond to pa...
Spatial 3D local descriptors for object recognition in RGB-D images Loğoğlu, K. Berker; Temizel, Alptekin; Kalkan, Sinan; Department of Information Systems (2016) Introduction of the affordable but relatively high resolution color and depth synchronized RGB-D sensors, along with the efforts on open-source point-cloud processing tools boosted research in both computer vision and robotics. One of the key areas which have drawn particular attention is object recognition since it is one of the crucial steps for various applications. In this thesis, two spatially enhanced local 3D descriptors are proposed for object recognition tasks: Histograms of Spatial Concentric Surf...
A Comparative evaluation of foreground / background segmentation algorithms Pakyürek, Muhammet; Akar, Gözde; Department of Electrical and Electronics Engineering (2012) Foreground Background segmentation is a process which separates the stationary objects from the moving objects on the scene. It plays significant role in computer vision applications. In this study, several background foreground segmentation algorithms are analyzed by changing their critical parameters individually to see the sensitivity of the algorithms to some difficulties in background segmentation applications. These difficulties are illumination level, view angles of camera, noise level, and range of ...

Citation Formats

E. Gündoğdu, “Good features to correlate for visual tracking,” Ph.D. - Doctoral Program, Middle East Technical University, 2017.