Rescoring detections based on contextual scores in object detection

Zorlu, Ersan Vural
To detect objects in an image, current state-of-the-art object detectors firstly definecandidate object locations, and then classify each of them into one of the predefinedcategories or as background. They do so by using the visual features extracted locallyfrom the candidate locations; omitting the rich contextual information embedded inthe whole image. Contextual information can be utilized to complement the informa-tion extracted locally and thereby to improve object detection accuracy. Researchershave proposed many models that exploit scene-level and/or instance-level context byusing non-local features from the same image. In this work, we propose models toimprove object detection by utilizing contextual information embedded in the con-fidence scores of detections in the whole image without using any visual features.Our models use object-to-object spatial and scale-related relationships and work as apost-processing step that can be plugged into any object detector. Specifically, for areference detection output by the base object detector, our model first defines a varietyof spatial and scale-based regions relative to the location of the reference detection.Then, each of these regions is summarized by the confidence scores of detectionsv inside it. Next, the confidence scores of the reference detection and the contextualconfidence scores are processed by our models. We propose three variants based onmultilayer perceptrons. We evaluate our models in conjunction with the state-of-the-art RetinaNet object detector on the widely used MSCOCO benchmark dataset, wherewe show that our models improve average precision by up to %1.8 points.
Citation Formats
E. V. Zorlu, “Rescoring detections based on contextual scores in object detection,” Thesis (M.S.) -- Graduate School of Natural and Applied Sciences. Computer Engineering., Middle East Technical University, 2019.