Improvements on one-stage object detection by visual reasoning

Aksoy, Tolga
Current state-of-the-art one-stage object detectors are limited by treating each image region separately without considering possible relations of the objects. This causes dependency solely on high-quality convolutional feature representations for detecting objects successfully. However, this may not be possible sometimes due to some challenging conditions. In this thesis, a new architecture is proposed for one-stage object detection that reasons the relations of the image regions by using self-attention. The proposed reasoning method considers semantic coherency between image regions and enhances features of these regions. Spatially and semantically enhanced features are fused with original features to improve performance. The proposed approach is applied to the current state-of-the-art real-time one-stage object detectors such as YOLOv3, YOLOv4 and YOLOR, then evaluated on COCO in terms of mAP.


T. Aksoy, “Improvements on one-stage object detection by visual reasoning,” M.S. - Master of Science, Middle East Technical University, 2022.