Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
Integrating near and long-range evidence for visual detection
Download
index.pdf
Date
2021-9
Author
Samet, Nermin
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
760
views
621
downloads
Cite This
This thesis presents HoughNet, a one-stage, anchor-free, voting-based, bottom-up object detection method. Inspired by the Generalized Hough Transform, HoughNet determines the presence of an object at a certain location by the sum of the votes cast on that location. Votes are collected from both near and long-distance locations based on a log-polar vote field. Thanks to this voting mechanism, HoughNet is able to integrate both near and long-range, class-conditional evidence for visual recognition, thereby generalizing and enhancing current object detection methodology, which typically relies on only local evidence. On the COCO dataset, HoughNet`s best model achieves 46.4 AP (and 65.1 AP_50), performing on par with the state-of-the-art in bottom-up object detection and outperforming most major one-stage and two-stage methods. We further validate the effectiveness of our proposal in other visual detection tasks, namely, video object detection, instance segmentation, 3D object detection, keypoint detection for human pose estimation and whole-body human pose estimation, face detection and an additional ``labels to photo`` image generation task, where the integration of our voting module consistently improves performance in all cases. In order to show the effectiveness of our proposal on whole-body human pose estimation task, we developed a bottom-up, one-stage method called HPRNet. In HPRNet, we build a hierarchical regression mechanism, where we define each of the whole-body keypoints with a relative location (i.e. offset) to a specific point on the person box. In the context of this thesis we also propose a one-stage, anchor-free object detector, PPDet, which integrates short-range interactions through voting. PPDet sum-pools predictions stemming from individual features into a single prediction which allows the model to reduce the contributions of non-discriminatory features during training.
Subject Keywords
Object detection
,
Voting
,
Bottom-up recognition
,
Hough Transform
,
Video object detection
,
Instance segmentation
,
3D object detection
,
Human pose estimation
,
Whole-body human pose estimation
,
Face detection
,
Image-to-image translation
,
Label-to-image translation
URI
https://hdl.handle.net/11511/92178
Collections
Graduate School of Natural and Applied Sciences, Thesis
Suggestions
OpenMETU
Core
HoughNet: Integrating Near and Long-Range Evidence for Visual Detection
Samet, Nermin; Hicsonmez, Samet; Akbaş, Emre (2022-1-01)
IEEEThis paper presents HoughNet, a one-stage, anchor-free, voting-based, bottom-up object detection method. Inspired by the Generalized Hough Transform, HoughNet determines the presence of an object at a certain location by the sum of the votes cast on that location. Votes are collected from both near and long-distance locations based on a log-polar vote field. Thanks to this voting mechanism, HoughNet is able to integrate both near and long-range, class-conditional evidence for visual recognition, thereby...
HoughNet: Integrating Near and Long-Range Evidence for Bottom-Up Object Detection
Samet, Nermin; Hicsonmez, Samet; Akbaş, Emre (2020-01-01)
This paper presents HoughNet, a one-stage, anchor-free, voting-based, bottom-up object detection method. Inspired by the Generalized Hough Transform, HoughNet determines the presence of an object at a certain location by the sum of the votes cast on that location. Votes are collected from both near and long-distance locations based on a log-polar vote field. Thanks to this voting mechanism, HoughNet is able to integrate both near and long-range, class-conditional evidence for visual recognition, thereby gen...
HPRNet: Hierarchical point regression for whole-body human pose estimation
SAMET, NERMİN; Akbaş, Emre (2021-11-01)
In this paper, we present a new bottom-up one-stage method for whole-body pose estimation, which we call “hierarchical point regression,” or HPRNet for short. In standard body pose estimation, the locations of ~17 major joints on the human body are estimated. Differently, in whole-body pose estimation, the locations of fine-grained keypoints (68 on face, 21 on each hand and 3 on each foot) are estimated as well, which creates a scale variance problem that needs to be addressed. To handle the scale variance ...
Time-domain mapping of electromagnetic ray movement inside anisotropic spherical resonator
Biber, A; Golick, A; Tomak, Mehmet (2002-09-01)
This paper presents the analytical proof of "Time-Domain Mapping Method" for the spherical resonator made up of uniaxial crystal. In this way, the main types of caustics inside the spherical resonator made up of uniaxial crystal, which were investigated numerically before, are confirmed analytically. It is engraved that the problem of the ray flow inside the spherical resonator can be reduced to the problem of the ray flow inside metal cavity shaped as spheroid.
Posterior Cram'er-Rao Lower Bounds for Extended Target Tracking with Random Matrices
Sarıtaş, Elif; Orguner, Umut (2016-07-08)
This paper presents posterior Cram'er-Rao lower bounds (PCRLB) for extended target tracking (ETT) when the extent states of the targets are represented with random matrices. PCRLB recursions are derived for kinematic and extent states taking complicated expectations involving Wishart and inverse Wishart distributions. For some analytically intractable expectations, Monte Carlo integration is used. The bounds for the semi-major and minor axes of the extent ellipsoid are obtained as well as those for the exte...
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
N. Samet, “Integrating near and long-range evidence for visual detection,” Ph.D. - Doctoral Program, Middle East Technical University, 2021.