Good features to correlate for visual tracking

Download
2017
Gündoğdu, Erhan
Estimating object motion is one of the key components of video processing and the first step in applications which require video representation. Visual object tracking is one way of extracting this component, and it is one of the major problems in the field of computer vision. Numerous discriminative and generative machine learning approaches have been employed to solve this problem. Recently, correlation filter based (CFB) approaches have been popular due to their computational efficiency and notable performances on benchmark datasets. The ultimate goal of CFB approaches is to find a filter (emph{i.e.}, template) which can produce high correlation outputs around the actual object location and low correlation outputs around the locations that are far from the object. Nevertheless, CFB visual tracking methods suffer from many challenges, such as occlusion, abrupt appearance changes, fast motion and object deformation. The main reasons of these sufferings are forgetting the past poses of the objects due to the simple update stages of CFB methods, non-optimal model update rate and features that are not invariant to appearance changes of the target object. In order to address the aforementioned disadvantages of CFB visual tracking methods, this thesis includes three major contributions. First, a spatial window learning method is proposed to improve the correlation quality. For this purpose, a window that is to be element-wise multiplied by the object observation (or the correlation filter) is learned by a novel gradient descent procedure. The learned window is capable of suppressing/highlighting the necessary regions of the object, and can improve the tracking performance in the case of occlusions and object deformation. As the second contribution, an ensemble of trackers algorithm is proposed to handle the issues of non-optimal learning rate and forgetting the past poses of the object. The trackers in the ensemble are organized in a binary tree, which stores individual expert trackers at its nodes. During the course of tracking, the relevant expert trackers to the most recent object appearance are activated and utilized in the localization and update stages. The proposed ensemble method significantly improves the tracking accuracy, especially when the expert trackers are selected as the CFB trackers utilizing the proposed window learning method. The final contribution of the thesis addresses the feature learning problem specifically focused on the CFB visual tracking loss function. For this loss function, a novel backpropagation algorithm is developed to train any fully deep convolutional neural network. The proposed gradient calculation, which is required for backpropagation, is performed efficiently in both frequency and image domain, and has a linear complexity with the number of feature maps. The training of the network model is fulfilled on carefully curated datasets including well-known difficulties of visual tracking, emph{e.g.}, occlusion, object deformation and fast motion. When the learned features are integrated to the state-of-the-art CFB visual trackers, favorable tracking performance is obtained on benchmark datasets against the CFB methods that employ hand-crafted features or deep features extracted from the pre-trained classification models.

Suggestions

A comparative study on pose estimation algorithms using visual data
Çetinkaya, Güven; Alatan, Abdullah Aydın; Department of Electrical and Electronics Engineering (2012)
Computation of the position and orientation of an object with respect to a camera from its images is called pose estimation problem. Pose estimation is one of the major problems in computer vision, robotics and photogrammetry. Object tracking, object recognition, self-localization of robots are typical examples for the use of pose estimation. Determining the pose of an object from its projections requires 3D model of an object in its own reference system, the camera parameters and 2D image of the object. Mo...
Hierarchical representations for visual object tracking by detection
Beşbınar, Beril; Alatan, Abdullah Aydın; Department of Electrical and Electronics Engineering (2015)
Deep learning is the discipline of training computational models that are composed of multiple layers and these methods have improved the state of the art in many areas such as visual object detection, scene understanding or speech recognition. Rebirth of these fairly old computational models is usually related to the availability of large datasets, increase in the computational power of current hardware and more recently proposed unsupervised training methods that exploit the internal structure of very lar...
Comparison of histograms of oriented optical flow based action recognition methods
Erciş, Fırat; Ulusoy, İlkay; Department of Electrical and Electronics Engineering (2012)
In the task of human action recognition in uncontrolled video, motion features are used widely in order to achieve subject and appearence invariance. We implemented 3 Histograms of Oriented Optical Flow based method which have a common motion feature extraction phase. We compute an optical flow field over each frame of the video. Then those flow vectors are histogrammed due to angle values to represent each frame with a histogram. In order to capture local motions, The bounding box of the subject is divided...
Object recognition and segmentation via shape models
Altınoklu, Metin Burak; Ulusoy, İlkay; Tarı, Zehra Sibel; Department of Electrical and Electronics Engineering (2016)
In this thesis, the problem of object detection, recognition and segmentation in computer vision is addressed with shape based methods. An efficient object detection method based on a sparse skeleton has been proposed. The proposed method is an improved chamfer template matching method for recognition of articulated objects. Using a probabilistic graphical model structure, shape variation is represented in a skeletal shape model, where nodes correspond to parts consisting of lines and edges correspond to pa...
Motion estimation using complex discrete wavelet transform
Sarı, Hüseyin; Severcan, Mete; Department of Electrical and Electronics Engineering (2003)
The estimation of optical flow has become a vital research field in image sequence analysis especially in past two decades, which found applications in many fields such as stereo optics, video compression, robotics and computer vision. In this thesis, the complex wavelet based algorithm for the estimation of optical flow developed by Magarey and Kingsbury is implemented and investigated. The algorithm is based on a complex version of the discrete wavelet transform (CDWT), which analyzes an image through blo...
Citation Formats
E. Gündoğdu, “Good features to correlate for visual tracking,” Ph.D. - Doctoral Program, Middle East Technical University, 2017.