Improvements on one-stage object detection by visual reasoning

2022-5-09
Aksoy, Tolga
Current state-of-the-art one-stage object detectors are limited by treating each image region separately without considering possible relations of the objects. This causes dependency solely on high-quality convolutional feature representations for detecting objects successfully. However, this may not be possible sometimes due to some challenging conditions. In this thesis, a new architecture is proposed for one-stage object detection that reasons the relations of the image regions by using self-attention. The proposed reasoning method considers semantic coherency between image regions and enhances features of these regions. Spatially and semantically enhanced features are fused with original features to improve performance. The proposed approach is applied to the current state-of-the-art real-time one-stage object detectors such as YOLOv3, YOLOv4 and YOLOR, then evaluated on COCO in terms of mAP.

Suggestions

Rescoring detections based on contextual scores in object detection
Zorlu, Ersan Vural; Akbaş, Emre; Department of Computer Engineering (2019)
To detect objects in an image, current state-of-the-art object detectors firstly definecandidate object locations, and then classify each of them into one of the predefinedcategories or as background. They do so by using the visual features extracted locallyfrom the candidate locations; omitting the rich contextual information embedded inthe whole image. Contextual information can be utilized to complement the informa-tion extracted locally and thereby to improve object detection accuracy. Researchershave p...
New models and inference techniques for Gaussian process-based extended object tracking
Kumru, Murat; Özkan, Emre; Department of Electrical and Electronics Engineering (2022-9-09)
In this thesis, we consider the problem of tracking dynamic objects with unknown shapes using point cloud measurements generated by, e.g., lidars, radars, and depth cameras. The point measurements do not only convey information about the object pose, i.e., position and orientation, but they also naturally reveal the characteristics of its latent extent. Aiming to harness the full potential of the available information, we investigate the Gaussian process-based extended object tracking (GPEOT) framework. W...
A Computationally Efficient Appearance-Based Algorithm for Geospatial Object Detection
Arslan, Duygu; Alatan, Abdullah Aydın (2012-04-27)
A computationally efficient appearance-based algorithm for geospatial object detection is presented and evaluated specifically for aircraft detection from satellite imagery. An aircraft operator exploiting the edge information via gray level differences between the aircraft and its background is constructed with Haar-like polygon regions by using the shape information of the aircraft as an invariant. Fast evaluation of the aircraft operator is achieved by means of integral image. Rotated integral images are...
Scale invariant representation of 2 5D data
AKAGUNDUZ, Erdem; ULUSOY PARNAS, İLKAY; BOZKURT, Nesli; Halıcı, Uğur (2007-06-13)
In this paper, a scale and orientation invariant feature representation for 2.5D objects is introduced, which may be used to classify, detect and recognize objects even under the cases of cluttering and/or occlusion. With this representation a 2.5D object is defined by an attributed graph structure, in which the nodes are the pit and peak regions on the surface. The attributes of the graph are the scales, positions and the normals of these pits and peaks. In order to detect these regions a "peakness" (or pi...
A multimodal approach for individual tracking of people and their belongings
Beyan, Çiğdem; Temizel, Alptekin (2015-04-01)
In this study, a fully automatic surveillance system for indoor environments which is capable of tracking multiple objects using both visible and thermal band images is proposed. These two modalities are fused to track people and the objects they carry separately using their heat signatures and the owners of the belongings are determined. Fusion of complementary information from different modalities (for example, thermal images are not affected by shadows and there is no thermal reflection or halo effect in...
Citation Formats
T. Aksoy, “Improvements on one-stage object detection by visual reasoning,” M.S. - Master of Science, Middle East Technical University, 2022.