Hierarchical representations for visual object tracking by detection

Beşbınar, Beril
Deep learning is the discipline of training computational models that are composed of multiple layers and these methods have improved the state of the art in many areas such as visual object detection, scene understanding or speech recognition. Rebirth of these fairly old computational models is usually related to the availability of large datasets, increase in the computational power of current hardware and more recently proposed unsupervised training methods that exploit the internal structure of very large, unlabeled datasets. An exhausting search of good parameters that are usually on the order of thousands, or even millions, is nearly impossible to result in a meaningful model when available dataset is relatively small and this is the reason why deep architectures are barely used for visual object tracking, which is a challenging yet very important task in computer vision. In this thesis, we investigate the use of hierarchical representations within the tracking-by-detection framework, a common strategy in visual object tracking that regards tracking as a detection problem in still images where temporal information is handled within a Bayesian approach. Stacked autoencoders and convolutional neural networks are trained using auxiliary datasets and the resultant hierarchical representations are experimented both off-the-shelf and after fine-tuning the pre-trained models using the few samples available. Experiments are realized using a challenge toolkit, which not only enables a fair comparison of hierarchical representations with well-known and widely-used hand-crafted features by using the same tracking-by-detection setting, but also demonstrates the performance of utilized framework among all recent visual tracking algorithms. Test results show that exploiting the intricate structure in auxiliary dataset, even without fine-tuning, contributes to the solution of visual object tracking problem.


Data-driven image captioning via salient region discovery
Kilickaya, Mert; Akkuş, Burak Kerim; Çakıcı, Ruket; Erdem, Aykut; Erdem, Erkut; İKİZLER CİNBİŞ, NAZLI (Institution of Engineering and Technology (IET), 2017-09-01)
n the past few years, automatically generating descriptions for images has attracted a lot of attention in computer vision and natural language processing research. Among the existing approaches, data-driven methods have been proven to be highly effective. These methods compare the given image against a large set of training images to determine a set of relevant images, then generate a description using the associated captions. In this study, the authors propose to integrate an object-based semantic image r...
Object recognition and segmentation via shape models
Altınoklu, Metin Burak; Ulusoy, İlkay; Tarı, Zehra Sibel; Department of Electrical and Electronics Engineering (2016)
In this thesis, the problem of object detection, recognition and segmentation in computer vision is addressed with shape based methods. An efficient object detection method based on a sparse skeleton has been proposed. The proposed method is an improved chamfer template matching method for recognition of articulated objects. Using a probabilistic graphical model structure, shape variation is represented in a skeletal shape model, where nodes correspond to parts consisting of lines and edges correspond to pa...
Metric learning using deep recurrent networks for visual clustering and retrieval
Can, Oğul; Alatan, Abdullah Aydın; Department of Electrical and Electronics Engineering (2018)
Learning an image similarity metric plays a key role in visual analysis, especially for the cases where a training set contains a large number of hard negative samples that are difficult to distinguish from other classes. Due to the outstanding results of the deep metric learning on visual tasks, such as image clustering and retrieval, selecting a proper loss function and a sampling method becomes a central issue to boost the performance. The existing metric learning approaches have two significant drawback...
Deep Hierarchies in the Primate Visual Cortex: What Can We Learn for Computer Vision?
KRÜGER, Norbert; JANSSEN, Peter; Kalkan, Sinan; LAPPE, Markus; LEONARDİS, Ales; PİATER, Justus; Rodriguez-Sanchez, Antonio J.; WİSKOTT, Laurenz (Institute of Electrical and Electronics Engineers (IEEE), 2013-08-01)
Computational modeling of the primate visual system yields insights of potential relevance to some of the challenges that computer vision is facing, such as object recognition and categorization, motion detection and activity recognition, or vision-based navigation and manipulation. This paper reviews some functional principles and structures that are generally thought to underlie the primate visual cortex, and attempts to extract biological principles that could further advance computer vision research. Or...
Comparison of histograms of oriented optical flow based action recognition methods
Erciş, Fırat; Ulusoy, İlkay; Department of Electrical and Electronics Engineering (2012)
In the task of human action recognition in uncontrolled video, motion features are used widely in order to achieve subject and appearence invariance. We implemented 3 Histograms of Oriented Optical Flow based method which have a common motion feature extraction phase. We compute an optical flow field over each frame of the video. Then those flow vectors are histogrammed due to angle values to represent each frame with a histogram. In order to capture local motions, The bounding box of the subject is divided...
Citation Formats
B. Beşbınar, “Hierarchical representations for visual object tracking by detection,” M.S. - Master of Science, Middle East Technical University, 2015.