Rescoring detections based on contextual scores in object detection

Download
2019
Zorlu, Ersan Vural
To detect objects in an image, current state-of-the-art object detectors firstly definecandidate object locations, and then classify each of them into one of the predefinedcategories or as background. They do so by using the visual features extracted locallyfrom the candidate locations; omitting the rich contextual information embedded inthe whole image. Contextual information can be utilized to complement the informa-tion extracted locally and thereby to improve object detection accuracy. Researchershave proposed many models that exploit scene-level and/or instance-level context byusing non-local features from the same image. In this work, we propose models toimprove object detection by utilizing contextual information embedded in the con-fidence scores of detections in the whole image without using any visual features.Our models use object-to-object spatial and scale-related relationships and work as apost-processing step that can be plugged into any object detector. Specifically, for areference detection output by the base object detector, our model first defines a varietyof spatial and scale-based regions relative to the location of the reference detection.Then, each of these regions is summarized by the confidence scores of detectionsv inside it. Next, the confidence scores of the reference detection and the contextualconfidence scores are processed by our models. We propose three variants based onmultilayer perceptrons. We evaluate our models in conjunction with the state-of-the-art RetinaNet object detector on the widely used MSCOCO benchmark dataset, wherewe show that our models improve average precision by up to %1.8 points.

Suggestions

Object Detection with Convolutional Context Features
Kaya, Emre Can; Alatan, Abdullah Aydın (2017-01-01)
A novel extension to Huh B-ESA object detection algorithm is proposed in order to learn convolutional context features for determining boundaries of objects better. For input images, the hypothesis windows and their context around those windows are learned through convolutional layers as two parallel networks. The resulting object and context feature maps are combined in such a way that they preserve their spatial relationship. The proposed algorithm is trained and evaluated on PASCAL VOC 2007 detection ben...
Utilization of dense depth information for monoview object detection and instance segmentation
Çakırgöz, Çağlayan Can; Alatan, Abdullah Aydın; Department of Electrical and Electronics Engineering (2022-5-10)
Object detection aims for detecting objects of certain classes in an image by bounding them in rectangular boxes whereas instance segmentation tries to detect objects in pixel level. Deep learning techniques, which have shown great improvements over the last decade, are utilized in these topics as well, and a significant success is achieved against the traditional methods. Similar improvements can be observed in dense depth estimation which deals with deducing dense information of a scene from a single imag...
Scale invariant representation of 2 5D data
AKAGUNDUZ, Erdem; ULUSOY PARNAS, İLKAY; BOZKURT, Nesli; Halıcı, Uğur (2007-06-13)
In this paper, a scale and orientation invariant feature representation for 2.5D objects is introduced, which may be used to classify, detect and recognize objects even under the cases of cluttering and/or occlusion. With this representation a 2.5D object is defined by an attributed graph structure, in which the nodes are the pit and peak regions on the surface. The attributes of the graph are the scales, positions and the normals of these pits and peaks. In order to detect these regions a "peakness" (or pi...
Fine-grained object recognition and zero-shot learning in multispectral imagery
Sumbul, Gencer; Cinbiş, Ramazan Gökberk; AKSOY, SELİM (2018-05-05)
We present a method for fine-grained object recognition problem, that aims to recognize the type of an object among a large number of sub-categories, and zero-shot learning scenario on multispectral images. In order to establish a relation between seen classes and new unseen classes, a compatibility function between image features extracted from a convolutional neural network and auxiliary information of classes is learnt. Knowledge transfer for unseen classes is carried out by maximizing this function. Per...
Multisource region attention network for fine-grained object recognition in remote sensing imagery
Sümbül, Gencer; Cinbiş, Ramazan Gökberk; Aksoy, Selim (Institute of Electrical and Electronics Engineers (IEEE), 2019-07)
Fine-grained object recognition concerns the identification of the type of an object among a large number of closely related subcategories. Multisource data analysis that aims to leverage the complementary spectral, spatial, and structural information embedded in different sources is a promising direction toward solving the fine-grained recognition problem that involves low between-class variance, small training set sizes for rare classes, and class imbalance. However, the common assumption of coregistered ...
Citation Formats
E. V. Zorlu, “Rescoring detections based on contextual scores in object detection,” Thesis (M.S.) -- Graduate School of Natural and Applied Sciences. Computer Engineering., Middle East Technical University, 2019.