Hierarchical representations for visual object tracking by detection

Download
2015
Beşbınar, Beril
Deep learning is the discipline of training computational models that are composed of multiple layers and these methods have improved the state of the art in many areas such as visual object detection, scene understanding or speech recognition. Rebirth of these fairly old computational models is usually related to the availability of large datasets, increase in the computational power of current hardware and more recently proposed unsupervised training methods that exploit the internal structure of very large, unlabeled datasets. An exhausting search of good parameters that are usually on the order of thousands, or even millions, is nearly impossible to result in a meaningful model when available dataset is relatively small and this is the reason why deep architectures are barely used for visual object tracking, which is a challenging yet very important task in computer vision. In this thesis, we investigate the use of hierarchical representations within the tracking-by-detection framework, a common strategy in visual object tracking that regards tracking as a detection problem in still images where temporal information is handled within a Bayesian approach. Stacked autoencoders and convolutional neural networks are trained using auxiliary datasets and the resultant hierarchical representations are experimented both off-the-shelf and after fine-tuning the pre-trained models using the few samples available. Experiments are realized using a challenge toolkit, which not only enables a fair comparison of hierarchical representations with well-known and widely-used hand-crafted features by using the same tracking-by-detection setting, but also demonstrates the performance of utilized framework among all recent visual tracking algorithms. Test results show that exploiting the intricate structure in auxiliary dataset, even without fine-tuning, contributes to the solution of visual object tracking problem.

Suggestions

Visual Object Tracking with Autoencoder Representations
Besbinar, Beril; Alatan, Abdullah Aydın (2016-05-19)
Deep learning is the discipline of training computational models that are composed of multiple layers and these methods have recently improved the state of the art in many areas as a virtue of large labeled datasets, increase in the computational power of current hardware and unsupervised training methods. Although such a dataset may not be available for lots of application areas, the representations obtained by the well-designed networks that have a large representation capacity and trained with enough dat...
Metric learning using deep recurrent networks for visual clustering and retrieval
Can, Oğul; Alatan, Abdullah Aydın; Department of Electrical and Electronics Engineering (2018)
Learning an image similarity metric plays a key role in visual analysis, especially for the cases where a training set contains a large number of hard negative samples that are difficult to distinguish from other classes. Due to the outstanding results of the deep metric learning on visual tasks, such as image clustering and retrieval, selecting a proper loss function and a sampling method becomes a central issue to boost the performance. The existing metric learning approaches have two significant drawback...
Object recognition and segmentation via shape models
Altınoklu, Metin Burak; Ulusoy, İlkay; Tarı, Zehra Sibel; Department of Electrical and Electronics Engineering (2016)
In this thesis, the problem of object detection, recognition and segmentation in computer vision is addressed with shape based methods. An efficient object detection method based on a sparse skeleton has been proposed. The proposed method is an improved chamfer template matching method for recognition of articulated objects. Using a probabilistic graphical model structure, shape variation is represented in a skeletal shape model, where nodes correspond to parts consisting of lines and edges correspond to pa...
Visual object detection and tracking using local convolutional context features and recurrent neural networks
Kaya, Emre Can; Alatan, Abdullah Aydın; Department of Electrical and Electronics Engineering (2018)
Visual object detection and tracking are two major problems in computer vision which have important real-life application areas. During the last decade, Convolutional Neural Networks (CNNs) have received significant attention and outperformed methods that rely on handcrafted representations in both detection and tracking. On the other hand, Recurrent Neural Networks (RNNs) are commonly preferred for modeling sequential data such as video sequences. A novel convolutional context feature extension is introduc...
Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures
Bernardi, Raffaella; Cakici, Ruket; Elliott, Desmond; Erdem, Aykut; Erdem, Erkut; Ikizler-Cinbis, Nazli; Keller, Frank; Muscat, Adrian; Plank, Barbara (AI Access Foundation, 2016-2-23)
Automatic description generation from natural images is a challenging problem that has recently received a large amount of interest from the computer vision and natural language processing communities. In this survey, we classify the existing approaches based on how they conceptualize this problem, viz., models that cast description as either generation problem or as a retrieval problem over a visual or multimodal representational space. We provide a detailed review of existing models, highlighting their ad...
Citation Formats
B. Beşbınar, “Hierarchical representations for visual object tracking by detection,” M.S. - Master of Science, Middle East Technical University, 2015.