PriorBox: long-tail calibration with priors

Download
2022-8-31
Dursun, Abdullah
Deep learning brought considerable improvements to computer vision, especially in recognition problems such as image classification, object detection, semantic segmentation, instance segmentation, and keypoint detection. These problems have critical applications in the real world, especially in the search, social media, and surveillance domains. Unfortunately, there is still a remarkable accuracy gap between research datasets and real-world deployments caused by data distribution disparity. In particular, most detection methods have a noticeable accuracy drop on datasets with long-tailed distributions due to the bias towards frequent classes. This thesis describes PriorBox, which learns calibration factors for long-tail datasets utilizing class distributions and a simple convolutional neural network. Since PriorBox uses easy-to-collect distributional and spatial priors, it does not introduce any data collection steps. Furthermore, the proposed method does not include typical class-rebalancing and loss manipulation strategies and works well with the existing object detection and instance segmentation models. Simple distributional class priors, such as the number of instances, size and aspect ratio are shown to be helpful for improving detection results on rare classes without a significant impact on the inference speed. We thoroughly evaluate the approach on the LVIS dataset using the Mask R-CNN baseline on long-tail object detection and instance segmentation tasks.

Suggestions

Object recognition and segmentation via shape models
Altınoklu, Metin Burak; Ulusoy, İlkay; Tarı, Zehra Sibel; Department of Electrical and Electronics Engineering (2016)
In this thesis, the problem of object detection, recognition and segmentation in computer vision is addressed with shape based methods. An efficient object detection method based on a sparse skeleton has been proposed. The proposed method is an improved chamfer template matching method for recognition of articulated objects. Using a probabilistic graphical model structure, shape variation is represented in a skeletal shape model, where nodes correspond to parts consisting of lines and edges correspond to pa...
Quantifying and mitigating class imbalance in long-tailed visual recognition
Baltacı, Zeynep Sonat; Kalkan, Sinan; Akbaş, Emre; Department of Computer Engineering (2022-7)
Objects are distributed unevenly in real world, which manifests itself as a long-tailed distribution in realistic visual recognition datasets. Deep learning based approaches trained on such imbalanced datasets using conventional gradient-based training strategies exhibit unfair recognition performances towards classes that are under-represented in the dataset. This so-called class imbalance has been studied in the literature by measuring imbalance via either class frequency or class hardness, and using thos...
Recursive shortest spanning tree algorithms for image segmentation
Bayramoglu, NY; Bazlamaçcı, Cüneyt Fehmi (2005-11-24)
Image segmentation has an important role in image processing and the speed of the segmentation algorithm may become a drawback for some applications. This study analyzes the run time performances of some variations of the Recursive Shortest Spanning Tree Algorithm (RSST) and proposes simple but effective modifications on these algorithms to improve their speeds. In addition, the effect of link weight cost function on the run time performance and the segmentation quality is examined. For further improvement ...
Deep Hierarchies in the Primate Visual Cortex: What Can We Learn for Computer Vision?
KRÜGER, Norbert; JANSSEN, Peter; Kalkan, Sinan; LAPPE, Markus; LEONARDİS, Ales; PİATER, Justus; Rodriguez-Sanchez, Antonio J.; WİSKOTT, Laurenz (Institute of Electrical and Electronics Engineers (IEEE), 2013-08-01)
Computational modeling of the primate visual system yields insights of potential relevance to some of the challenges that computer vision is facing, such as object recognition and categorization, motion detection and activity recognition, or vision-based navigation and manipulation. This paper reviews some functional principles and structures that are generally thought to underlie the primate visual cortex, and attempts to extract biological principles that could further advance computer vision research. Or...
Visual object detection and tracking using local convolutional context features and recurrent neural networks
Kaya, Emre Can; Alatan, Abdullah Aydın; Department of Electrical and Electronics Engineering (2018)
Visual object detection and tracking are two major problems in computer vision which have important real-life application areas. During the last decade, Convolutional Neural Networks (CNNs) have received significant attention and outperformed methods that rely on handcrafted representations in both detection and tracking. On the other hand, Recurrent Neural Networks (RNNs) are commonly preferred for modeling sequential data such as video sequences. A novel convolutional context feature extension is introduc...
Citation Formats
A. Dursun, “PriorBox: long-tail calibration with priors,” M.S. - Master of Science, Middle East Technical University, 2022.