Visual object detection and tracking using local convolutional context features and recurrent neural networks

Download
2018
Kaya, Emre Can
Visual object detection and tracking are two major problems in computer vision which have important real-life application areas. During the last decade, Convolutional Neural Networks (CNNs) have received significant attention and outperformed methods that rely on handcrafted representations in both detection and tracking. On the other hand, Recurrent Neural Networks (RNNs) are commonly preferred for modeling sequential data such as video sequences. A novel convolutional context feature extension is introduced to a proposal-based detection scheme for improving object detection performance. A comprehensive experimental study is conducted to demonstrate the effectiveness of this newly proposed approach. On the tracking side, the effect of several design choices is investigated for an RNN-based tracking algorithm by the help of comparative experiments. Finally, the proposed context feature based method is combined with the RNN-based tracking framework and a joint detection-tracking framework that outperforms the baseline model is proposed.

Suggestions

Object recognition and segmentation via shape models
Altınoklu, Metin Burak; Ulusoy, İlkay; Tarı, Zehra Sibel; Department of Electrical and Electronics Engineering (2016)
In this thesis, the problem of object detection, recognition and segmentation in computer vision is addressed with shape based methods. An efficient object detection method based on a sparse skeleton has been proposed. The proposed method is an improved chamfer template matching method for recognition of articulated objects. Using a probabilistic graphical model structure, shape variation is represented in a skeletal shape model, where nodes correspond to parts consisting of lines and edges correspond to pa...
Hierarchical representations for visual object tracking by detection
Beşbınar, Beril; Alatan, Abdullah Aydın; Department of Electrical and Electronics Engineering (2015)
Deep learning is the discipline of training computational models that are composed of multiple layers and these methods have improved the state of the art in many areas such as visual object detection, scene understanding or speech recognition. Rebirth of these fairly old computational models is usually related to the availability of large datasets, increase in the computational power of current hardware and more recently proposed unsupervised training methods that exploit the internal structure of very lar...
3D TRACKING OF PEOPLE WITH RAO-BLACKWELLIZED PARTICLE FILTERS
Topcu, Osman; Orguner, Umut; Alatan, Abdullah Aydın; ERCAN, ALİ ÖZER (2014-04-25)
Visual tracking has an important place among computer vision applications. Visual tracking with particle filters is a well-known methodology. The performance of particle filters is dependent on efficient sampling of the state space, which in turn, is dependent on number of particles. In this paper, Rao-Blackwell technique is applied to particle filters to improve sampling efficiency. Both algorithms are applied to people tracking problem. Under the same circumstances, the resulting algorithm is demonstrated...
Deep Hierarchies in the Primate Visual Cortex: What Can We Learn for Computer Vision?
KRÜGER, Norbert; JANSSEN, Peter; Kalkan, Sinan; LAPPE, Markus; LEONARDİS, Ales; PİATER, Justus; Rodriguez-Sanchez, Antonio J.; WİSKOTT, Laurenz (Institute of Electrical and Electronics Engineers (IEEE), 2013-08-01)
Computational modeling of the primate visual system yields insights of potential relevance to some of the challenges that computer vision is facing, such as object recognition and categorization, motion detection and activity recognition, or vision-based navigation and manipulation. This paper reviews some functional principles and structures that are generally thought to underlie the primate visual cortex, and attempts to extract biological principles that could further advance computer vision research. Or...
Data-driven image captioning via salient region discovery
Kilickaya, Mert; Akkuş, Burak Kerim; Çakıcı, Ruket; Erdem, Aykut; Erdem, Erkut; İKİZLER CİNBİŞ, NAZLI (Institution of Engineering and Technology (IET), 2017-09-01)
n the past few years, automatically generating descriptions for images has attracted a lot of attention in computer vision and natural language processing research. Among the existing approaches, data-driven methods have been proven to be highly effective. These methods compare the given image against a large set of training images to determine a set of relevant images, then generate a description using the associated captions. In this study, the authors propose to integrate an object-based semantic image r...