Integrating near and long-range evidence for visual detection

Download
2021-9
Samet, Nermin
This thesis presents HoughNet, a one-stage, anchor-free, voting-based, bottom-up object detection method. Inspired by the Generalized Hough Transform, HoughNet determines the presence of an object at a certain location by the sum of the votes cast on that location. Votes are collected from both near and long-distance locations based on a log-polar vote field. Thanks to this voting mechanism, HoughNet is able to integrate both near and long-range, class-conditional evidence for visual recognition, thereby generalizing and enhancing current object detection methodology, which typically relies on only local evidence. On the COCO dataset, HoughNet`s best model achieves 46.4 AP (and 65.1 AP_50), performing on par with the state-of-the-art in bottom-up object detection and outperforming most major one-stage and two-stage methods. We further validate the effectiveness of our proposal in other visual detection tasks, namely, video object detection, instance segmentation, 3D object detection, keypoint detection for human pose estimation and whole-body human pose estimation, face detection and an additional ``labels to photo`` image generation task, where the integration of our voting module consistently improves performance in all cases. In order to show the effectiveness of our proposal on whole-body human pose estimation task, we developed a bottom-up, one-stage method called HPRNet. In HPRNet, we build a hierarchical regression mechanism, where we define each of the whole-body keypoints with a relative location (i.e. offset) to a specific point on the person box. In the context of this thesis we also propose a one-stage, anchor-free object detector, PPDet, which integrates short-range interactions through voting. PPDet sum-pools predictions stemming from individual features into a single prediction which allows the model to reduce the contributions of non-discriminatory features during training.

Suggestions

HoughNet: Integrating Near and Long-Range Evidence for Bottom-Up Object Detection
Samet, Nermin; Hicsonmez, Samet; Akbaş, Emre (2020-01-01)
This paper presents HoughNet, a one-stage, anchor-free, voting-based, bottom-up object detection method. Inspired by the Generalized Hough Transform, HoughNet determines the presence of an object at a certain location by the sum of the votes cast on that location. Votes are collected from both near and long-distance locations based on a log-polar vote field. Thanks to this voting mechanism, HoughNet is able to integrate both near and long-range, class-conditional evidence for visual recognition, thereby gen...
A FRAMEWORK FOR DETECTING COMPLEX EVENTS IN SURVEILLANCE VIDEOS
Onal, Itir; Kardas, Karani; Rezaeitabar, Yousef; Bayram, Ulya; Bal, Murat; Ulusoy, İlkay; Cicekli, Nihan Kesim (2013-07-19)
This paper presents a framework for detecting complex events in surveillance videos. Moving objects in the foreground are detected in the object detection component of the system. Whether these foregrounds are human or not is decided in the object recognition component. Then each detected object is tracked and labeled in the object tracking component, in which true labeling of objects in the occlusion situation is also provided. The extracted information is fed to the event detection component. Rule based e...
Time-domain mapping of electromagnetic ray movement inside anisotropic spherical resonator
Biber, A; Golick, A; Tomak, Mehmet (2002-09-01)
This paper presents the analytical proof of "Time-Domain Mapping Method" for the spherical resonator made up of uniaxial crystal. In this way, the main types of caustics inside the spherical resonator made up of uniaxial crystal, which were investigated numerically before, are confirmed analytically. It is engraved that the problem of the ray flow inside the spherical resonator can be reduced to the problem of the ray flow inside metal cavity shaped as spheroid.
Improvements on one-stage object detection by visual reasoning
Aksoy, Tolga; Halıcı, Uğur; Department of Electrical and Electronics Engineering (2022-5-09)
Current state-of-the-art one-stage object detectors are limited by treating each image region separately without considering possible relations of the objects. This causes dependency solely on high-quality convolutional feature representations for detecting objects successfully. However, this may not be possible sometimes due to some challenging conditions. In this thesis, a new architecture is proposed for one-stage object detection that reasons the relations of the image regions by using self-attention. T...
Moving object detection with supervised learning methods
Köksal, Aybora; Alatan, Abdullah Aydın; İnce, Kutalmış Gökalp; Department of Electrical and Electronics Engineering (2021-9-7)
In this thesis, single target object detection problem is examined. Object detection is a problem that aims defining all of the objects of interest with their pre-defined classes in an image, or in a series of images. The main objective of this thesis is to exploit spatio-temporal information for performance enhancement during moving object detection. To this extent, modern object detection algorithms which are based on CNN architectures are analyzed. Based on this analysis, state-of-the-art techniques whic...
Citation Formats
N. Samet, “Integrating near and long-range evidence for visual detection,” Ph.D. - Doctoral Program, Middle East Technical University, 2021.