Integrating near and long-range evidence for visual detection

Download

index.pdf

Date

2021-9

Author

Samet, Nermin

Metadata

Show full item record

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Item Usage Stats

888
views

679
downloads

This thesis presents HoughNet, a one-stage, anchor-free, voting-based, bottom-up object detection method. Inspired by the Generalized Hough Transform, HoughNet determines the presence of an object at a certain location by the sum of the votes cast on that location. Votes are collected from both near and long-distance locations based on a log-polar vote field. Thanks to this voting mechanism, HoughNet is able to integrate both near and long-range, class-conditional evidence for visual recognition, thereby generalizing and enhancing current object detection methodology, which typically relies on only local evidence. On the COCO dataset, HoughNet`s best model achieves 46.4 AP (and 65.1 AP_50), performing on par with the state-of-the-art in bottom-up object detection and outperforming most major one-stage and two-stage methods. We further validate the effectiveness of our proposal in other visual detection tasks, namely, video object detection, instance segmentation, 3D object detection, keypoint detection for human pose estimation and whole-body human pose estimation, face detection and an additional ``labels to photo`` image generation task, where the integration of our voting module consistently improves performance in all cases. In order to show the effectiveness of our proposal on whole-body human pose estimation task, we developed a bottom-up, one-stage method called HPRNet. In HPRNet, we build a hierarchical regression mechanism, where we define each of the whole-body keypoints with a relative location (i.e. offset) to a specific point on the person box. In the context of this thesis we also propose a one-stage, anchor-free object detector, PPDet, which integrates short-range interactions through voting. PPDet sum-pools predictions stemming from individual features into a single prediction which allows the model to reduce the contributions of non-discriminatory features during training.

Subject Keywords

Object detection, Voting, Bottom-up recognition, Hough Transform, Video object detection, Instance segmentation, 3D object detection, Human pose estimation, Whole-body human pose estimation, Face detection, Image-to-image translation, Label-to-image translation

URI

https://hdl.handle.net/11511/92178

Collections

Graduate School of Natural and Applied Sciences, Thesis

Suggestions

OpenMETU
Core

HoughNet: Integrating Near and Long-Range Evidence for Visual Detection Samet, Nermin; Hicsonmez, Samet; Akbaş, Emre (2022-1-01) IEEEThis paper presents HoughNet, a one-stage, anchor-free, voting-based, bottom-up object detection method. Inspired by the Generalized Hough Transform, HoughNet determines the presence of an object at a certain location by the sum of the votes cast on that location. Votes are collected from both near and long-distance locations based on a log-polar vote field. Thanks to this voting mechanism, HoughNet is able to integrate both near and long-range, class-conditional evidence for visual recognition, thereby...
HoughNet: Integrating Near and Long-Range Evidence for Bottom-Up Object Detection Samet, Nermin; Hicsonmez, Samet; Akbaş, Emre (2020-01-01) This paper presents HoughNet, a one-stage, anchor-free, voting-based, bottom-up object detection method. Inspired by the Generalized Hough Transform, HoughNet determines the presence of an object at a certain location by the sum of the votes cast on that location. Votes are collected from both near and long-distance locations based on a log-polar vote field. Thanks to this voting mechanism, HoughNet is able to integrate both near and long-range, class-conditional evidence for visual recognition, thereby gen...
HPRNet: Hierarchical point regression for whole-body human pose estimation SAMET, NERMİN; Akbaş, Emre (2021-11-01) In this paper, we present a new bottom-up one-stage method for whole-body pose estimation, which we call “hierarchical point regression,” or HPRNet for short. In standard body pose estimation, the locations of ~17 major joints on the human body are estimated. Differently, in whole-body pose estimation, the locations of fine-grained keypoints (68 on face, 21 on each hand and 3 on each foot) are estimated as well, which creates a scale variance problem that needs to be addressed. To handle the scale variance ...
Time-domain mapping of electromagnetic ray movement inside anisotropic spherical resonator Biber, A; Golick, A; Tomak, Mehmet (2002-09-01) This paper presents the analytical proof of "Time-Domain Mapping Method" for the spherical resonator made up of uniaxial crystal. In this way, the main types of caustics inside the spherical resonator made up of uniaxial crystal, which were investigated numerically before, are confirmed analytically. It is engraved that the problem of the ray flow inside the spherical resonator can be reduced to the problem of the ray flow inside metal cavity shaped as spheroid.
Posterior Cram'er-Rao Lower Bounds for Extended Target Tracking with Random Matrices Sarıtaş, Elif; Orguner, Umut (2016-07-08) This paper presents posterior Cram'er-Rao lower bounds (PCRLB) for extended target tracking (ETT) when the extent states of the targets are represented with random matrices. PCRLB recursions are derived for kinematic and extent states taking complicated expectations involving Wishart and inverse Wishart distributions. For some analytically intractable expectations, Monte Carlo integration is used. The bounds for the semi-major and minor axes of the extent ellipsoid are obtained as well as those for the exte...

Citation Formats

N. Samet, “Integrating near and long-range evidence for visual detection,” Ph.D. - Doctoral Program, Middle East Technical University, 2021.