Quantifying and mitigating class imbalance in long-tailed visual recognition

Download

MSc_Thesis_Sonat_Baltaci.pdf

Date

2022-7

Author

Baltacı, Zeynep Sonat

Metadata

Show full item record

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Item Usage Stats

659
views

197
downloads

Objects are distributed unevenly in real world, which manifests itself as a long-tailed distribution in realistic visual recognition datasets. Deep learning based approaches trained on such imbalanced datasets using conventional gradient-based training strategies exhibit unfair recognition performances towards classes that are under-represented in the dataset. This so-called class imbalance has been studied in the literature by measuring imbalance via either class frequency or class hardness, and using those measures to mitigate imbalance by sampling, loss weighting or calibration strategies. In this thesis, we argue and empirically show that sample frequency or hardness alone is not sufficient for capturing imbalance among classes. Then we propose a novel measure based on predictive uncertainty of a trained deep network and demonstrate that it can capture imbalance better than existing approaches. Finally, we incorporate our measure to existing imbalance mitigation methods: loss reweighting, resampling, margin-based methods, and two-stage training. We show that predictive uncertainty-based methods improve over or perform on par with existing baselines on long-tailed datasets CIFAR-10-LT, CIFAR-100-LT and ImageNet-LT.

Subject Keywords

Long-tailed visual recognition, class imbalance, predictive uncertainty, imbalance mitigation

URI

https://hdl.handle.net/11511/98144

Collections

Graduate School of Natural and Applied Sciences, Thesis

Suggestions

OpenMETU
Core

Continuous dimensionality characterization of image structures Felsberg, Michael; Kalkan, Sinan; Kruger, Norbert (Elsevier BV, 2009-05-04) Intrinsic dimensionality is a concept introduced by statistics and later used in image processing to measure the dimensionality of a data set. In this paper, we introduce a continuous representation of the intrinsic dimension of an image patch in terms of its local spectrum or, equivalently, its gradient field. By making use of a cone structure and barycentric co-ordinates, we can associate three confidences to the three different ideal cases of intrinsic dimensions corresponding to homogeneous image patche...
Investigation of effect of design and operating parameters on acoustophoretic particle separation via 3D device-level simulations Sahin, Mehmet Akif; ÇETİN, BARBAROS; Özer, Mehmet Bülent (Springer Science and Business Media LLC, 2019-12-16) In the present study, a 3D device-level numerical model is implemented via finite element method to assess the effects of design and operating parameters on the separation performance of a microscale acoustofluidic device. Elastodynamic equations together with electromechanical coupling at the piezoelectric actuators for the stress field within the solid parts, Helmholtz equation for the acoustic field within fluid, and Navier-Stokes equations for the fluid flow are coupled for the simulations. Once the zer...
Analysis of Face Recognition Algorithms for Online and Automatic Annotation of Personal Videos Yılmaztürk, Mehmet; Ulusoy Parnas, İlkay; Çiçekli, Fehime Nihan (Springer, Dordrecht; 2010-05-08) Different from previous automatic but offline annotation systems, this paper studies automatic and online face annotation for personal videos/episodes of TV series considering Nearest Neighbourhood, LDA and SVM classification with Local Binary Patterns, Discrete Cosine Transform and Histogram of Oriented Gradients feature extraction methods in terms of their recognition accuracies and execution times. The best performing feature extraction method and the classifier pair is found out to be SVM classification...
Data-driven image captioning via salient region discovery Kilickaya, Mert; Akkuş, Burak Kerim; Çakıcı, Ruket; Erdem, Aykut; Erdem, Erkut; İKİZLER CİNBİŞ, NAZLI (Institution of Engineering and Technology (IET), 2017-09-01) n the past few years, automatically generating descriptions for images has attracted a lot of attention in computer vision and natural language processing research. Among the existing approaches, data-driven methods have been proven to be highly effective. These methods compare the given image against a large set of training images to determine a set of relevant images, then generate a description using the associated captions. In this study, the authors propose to integrate an object-based semantic image r...
PriorBox: long-tail calibration with priors Dursun, Abdullah; Cinbiş, Ramazan Gökberk; Department of Computer Engineering (2022-8-31) Deep learning brought considerable improvements to computer vision, especially in recognition problems such as image classification, object detection, semantic segmentation, instance segmentation, and keypoint detection. These problems have critical applications in the real world, especially in the search, social media, and surveillance domains. Unfortunately, there is still a remarkable accuracy gap between research datasets and real-world deployments caused by data distribution disparity. In particular, m...

Citation Formats

Z. S. Baltacı, “Quantifying and mitigating class imbalance in long-tailed visual recognition,” M.S. - Master of Science, Middle East Technical University, 2022.