Object Detection with Minimal Supervision

2023-1-18
Demirel, Berkan
Object detection is considered one of the most challenging problems in computer vision since it requires correctly predicting both the object classes and their locations. In the literature, object detection approaches are usually trained in a fully-supervised manner, with a large amount of annotated data for all classes. Since data annotation is costly in terms of both time and labor, there are also alternative object detection methods, such as weakly supervised or mixed supervised learning to reduce these costs in the literature. In this thesis, our focus is handling object detection problem with minimum supervision. In this context, we first define a difficult scenario namely zero-shot object detection (ZSD), where no visual training data is available for some of the target object classes. Secondly, we focus on the few-shot object detection (FSOD) problem and propose the novel meta-tuning principle. In the ZSD problem, we propose an approach that uses visual class embeddings and convex combinations of semantic embeddings in the classification part of single-stage object detectors. Following the proposed method, we focus on using more informative word embeddings, background modeling, and potential applications for ZSD methods. We first analyze the use of embedding vectors in deep models since these vectors are an essential knowledge source for zero-shot learning (ZSL), and we propose a novel approach that transforms semantically meaningful word vectors into visually meaningful ones. We show that using the proposed visually meaningful word embedding vectors obtain state-of-the-art results in the zero-shot classification (ZSC) problem. Then, we propose the first attempt to handle the background modeling in ZSD using a novel textual attention mechanism. Finally, we introduce a new problem within the scope of ZSD applications, which we call zero-shot image captioning (ZSIC), where the input images may consist of unseen object instances. The proposed ZSIC method use template-based sentence generators and fills the empty visual template slots with object proposals obtained from ZSD methods. In this context, we also propose a new evaluation metric called V-METEOR to evaluate the caption qualities more accurately for the ZSIC problem. In this thesis, we also focus on the FSOD problem and propose the meta-tuning principle, which allows us to model interpretable loss functions/data augmentation magnitudes in few-shot settings. Meta-tuning allows learning inductive biases that boost FSOD as an intermediate learning step using episodic learning. With the proposed RL-based meta-tuning approach, we model the loss function parameters and augmentation magnitudes, and obtain state-of-the-art results in the FSOD problem.

Suggestions

Object recognition and segmentation via shape models
Altınoklu, Metin Burak; Ulusoy, İlkay; Tarı, Zehra Sibel; Department of Electrical and Electronics Engineering (2016)
In this thesis, the problem of object detection, recognition and segmentation in computer vision is addressed with shape based methods. An efficient object detection method based on a sparse skeleton has been proposed. The proposed method is an improved chamfer template matching method for recognition of articulated objects. Using a probabilistic graphical model structure, shape variation is represented in a skeletal shape model, where nodes correspond to parts consisting of lines and edges correspond to pa...
Visual object detection and tracking using local convolutional context features and recurrent neural networks
Kaya, Emre Can; Alatan, Abdullah Aydın; Department of Electrical and Electronics Engineering (2018)
Visual object detection and tracking are two major problems in computer vision which have important real-life application areas. During the last decade, Convolutional Neural Networks (CNNs) have received significant attention and outperformed methods that rely on handcrafted representations in both detection and tracking. On the other hand, Recurrent Neural Networks (RNNs) are commonly preferred for modeling sequential data such as video sequences. A novel convolutional context feature extension is introduc...
AN ANALYSIS ON THE EFFECT OF DYNAMIC RANGE ON OBJECT DETECTION WITH DEEP NEURAL NETWORKS
Koçdemir, İsmail Hakkı; Kalkan, Sinan; Alatan, Abdullah Aydın; Department of Computer Engineering (2021-10-8)
An important problem in computer vision, particularly in object detection, is being able to perceive objects even under challenging illumination conditions. Being robust to such conditions is especially important in applications, such as autonomous driving. Despite the significance of the problem, existing autonomous driving systems use deep object detection networks with low-dynamic range (LDR) images during both the training phase and the testing phase. In this thesis, we investigate whether high-dynamic ...
Zero-Shot Object Detection by Hybrid Region Embedding
Berkan, Demirel; Cinbiş, Ramazan Gökberk; İkizler Cinbiş, Nazlı (2018-09-07)
Object detection is considered as one of the most challenging problems in computer vision, since it requires correct prediction of both classes and locations of objects in images. In this study, we define a more difficult scenario, namely zero-shot object detection (ZSD) where no visual training data is available for some of the target object classes. We present a novel approach to tackle this ZSD problem, where a convex combination of embeddings are used in conjunction with a detection framework. For evalu...
Object Detection with Convolutional Context Features
Kaya, Emre Can; Alatan, Abdullah Aydın (2017-01-01)
A novel extension to Huh B-ESA object detection algorithm is proposed in order to learn convolutional context features for determining boundaries of objects better. For input images, the hypothesis windows and their context around those windows are learned through convolutional layers as two parallel networks. The resulting object and context feature maps are combined in such a way that they preserve their spatial relationship. The proposed algorithm is trained and evaluated on PASCAL VOC 2007 detection ben...
Citation Formats
B. Demirel, “Object Detection with Minimal Supervision,” Ph.D. - Doctoral Program, Middle East Technical University, 2023.