Visual object tracking using semi supervised convolutional filters

Sevindik, Emir Can
Visual object tracking aims to find a single object position in a video frame, when a annotated bounding box is provided in the first frame. Correlation filters have always produced excellent results in terms of accuracy, while enjoying quite low computational complexity. The main property of correlation filter based trackers is to find a filter that can generate high values around the true target object location, whereas relatively low values for the locations away from the object. Recently, deep learning based methods have emerged to learn the optimal discriminative features to be utilized in correlation filters with promising results. Training of such deep feature extractors are usually performed by using both supervised and unsupervised learning techniques. In this thesis, the impact of semi supervised convolutional filters for the visual tracking problem is investigated in order to obtain robust features predicting the object location with high accuracy and being invariant to any kind of appearance change. Two different semi-supervision techniques are proposed and trained on ILSVRC2015 and TrackingNet dataset separately. They are also tested on widely used OTB50 and OTB100 tracking benchmark datasets. Semi-supervision v on ILSVRC2015 dataset leads 1.1% gain on success plot AUC value, 2.4% increase on precision plot AUC value and 2.2% increase on success rate in terms of OTB50 benchmark parameters. Similarly on OTB100 test set, 1.9% gain on success plot AUC value, 2.0% on precision plot AUC value and 2.8% increase on success rate is observed during semi-supervised experiments. In addition to the semi supervision methods, two joint supervision methodologies are also examined to observe the performance differences. The results show that both semi-supervision and joint supervision perform better than the fully supervised models, and such techniques still have superiorities on each other for different occasions.


Correlation tracking based on wavelet domain information
Ipek, HL; Yilmaz, I; Yardimci, YC; Cetin, AE (2003-08-07)
Tracking moving objects in video can be carried out by correlating a template containing object pixels with pixels of the current frame. This approach may produce erroneous results under noise. We determine a set of significant pixels on the object by analyzing the wavelet transform of the template and correlate only these pixels with the current frame to determine the next position of the object. These significant pixels are easily trackable features of the image and increase the performance of the tracker.
Fine-grained object recognition and zero-shot learning in multispectral imagery
Sumbul, Gencer; Cinbiş, Ramazan Gökberk; AKSOY, SELİM (2018-05-05)
We present a method for fine-grained object recognition problem, that aims to recognize the type of an object among a large number of sub-categories, and zero-shot learning scenario on multispectral images. In order to establish a relation between seen classes and new unseen classes, a compatibility function between image features extracted from a convolutional neural network and auxiliary information of classes is learnt. Knowledge transfer for unseen classes is carried out by maximizing this function. Per...
Keyframe based bi directional 2 D mesh representation for video object tracking and manipulation
Eren, Pekin Erhan (1999-10-28)
We propose a new bi-directional 2-D mesh representation of video objects, which utilizes multiple keyframes with forward and backward tracking. Experimental results on use of this representation for video object tracking in the presence of self occlusion are presented.
Rescoring detections based on contextual scores in object detection
Zorlu, Ersan Vural; Akbaş, Emre; Department of Computer Engineering (2019)
To detect objects in an image, current state-of-the-art object detectors firstly definecandidate object locations, and then classify each of them into one of the predefinedcategories or as background. They do so by using the visual features extracted locallyfrom the candidate locations; omitting the rich contextual information embedded inthe whole image. Contextual information can be utilized to complement the informa-tion extracted locally and thereby to improve object detection accuracy. Researchershave p...
Multi Modal Satellite Image Registration Using SIFT
Vural, Mehmet Firat; Yardimci, Yasemin; Temizel, Alptekin (2009-04-11)
Multi modal images need to be registered in order to use the unique information contained in these different modality images. In this paper, modifications on Scale Invariant Feature Transformation (SIFT), which is a popular method used for image matching, to improve its success on multi modal images are described. SIFT algorithm is immune to linear and partially immune to non-linear illumination changes. However, due to non linear illumination changes on multi-modal images, SIFT is not as powerful as it is ...
Citation Formats
E. C. Sevindik, “Visual object tracking using semi supervised convolutional filters,” M.S. - Master of Science, Middle East Technical University, 2020.