Improvements on one-stage object detection by visual reasoning

2022-5-09
Aksoy, Tolga
Current state-of-the-art one-stage object detectors are limited by treating each image region separately without considering possible relations of the objects. This causes dependency solely on high-quality convolutional feature representations for detecting objects successfully. However, this may not be possible sometimes due to some challenging conditions. In this thesis, a new architecture is proposed for one-stage object detection that reasons the relations of the image regions by using self-attention. The proposed reasoning method considers semantic coherency between image regions and enhances features of these regions. Spatially and semantically enhanced features are fused with original features to improve performance. The proposed approach is applied to the current state-of-the-art real-time one-stage object detectors such as YOLOv3, YOLOv4 and YOLOR, then evaluated on COCO in terms of mAP.

Suggestions

Rescoring detections based on contextual scores in object detection
Zorlu, Ersan Vural; Akbaş, Emre; Department of Computer Engineering (2019)
To detect objects in an image, current state-of-the-art object detectors firstly definecandidate object locations, and then classify each of them into one of the predefinedcategories or as background. They do so by using the visual features extracted locallyfrom the candidate locations; omitting the rich contextual information embedded inthe whole image. Contextual information can be utilized to complement the informa-tion extracted locally and thereby to improve object detection accuracy. Researchershave p...
IMPROVING PROPOSAL-BASED OBJECT DETECTION USING CONVOLUTIONAL CONTEXT FEATURES
Kaya, Emre Can; Alatan, Abdullah Aydın (2018-10-10)
A novel extension to proposal-based detection is proposed in order to learn convolutional context features for determining boundaries of objects better. Objects and their context are aimed to be learned through parallel convolutional stages. The resulting object and context feature maps are combined in such a way that they preserve their spatial relationship. The proposed algorithm is trained and evaluated on PASCAL VOC 2007 detection benchmark dataset and yielded improvements in performance over baseline, ...
A Computationally Efficient Appearance-Based Algorithm for Geospatial Object Detection
Arslan, Duygu; Alatan, Abdullah Aydın (2012-04-27)
A computationally efficient appearance-based algorithm for geospatial object detection is presented and evaluated specifically for aircraft detection from satellite imagery. An aircraft operator exploiting the edge information via gray level differences between the aircraft and its background is constructed with Haar-like polygon regions by using the shape information of the aircraft as an invariant. Fast evaluation of the aircraft operator is achieved by means of integral image. Rotated integral images are...
A multimodal approach for individual tracking of people and their belongings
Beyan, Çiğdem; Temizel, Alptekin (2015-04-01)
In this study, a fully automatic surveillance system for indoor environments which is capable of tracking multiple objects using both visible and thermal band images is proposed. These two modalities are fused to track people and the objects they carry separately using their heat signatures and the owners of the belongings are determined. Fusion of complementary information from different modalities (for example, thermal images are not affected by shadows and there is no thermal reflection or halo effect in...
Scale invariant representation of 2 5D data
AKAGUNDUZ, Erdem; ULUSOY PARNAS, İLKAY; BOZKURT, Nesli; Halıcı, Uğur (2007-06-13)
In this paper, a scale and orientation invariant feature representation for 2.5D objects is introduced, which may be used to classify, detect and recognize objects even under the cases of cluttering and/or occlusion. With this representation a 2.5D object is defined by an attributed graph structure, in which the nodes are the pit and peak regions on the surface. The attributes of the graph are the scales, positions and the normals of these pits and peaks. In order to detect these regions a "peakness" (or pi...
Citation Formats
T. Aksoy, “Improvements on one-stage object detection by visual reasoning,” M.S. - Master of Science, Middle East Technical University, 2022.