Detection of clean samples in noisy labelled datasets via analysis of artificially corrupted samples

Download

index.pdf

Date

2022-8-22

Author

Yıldırım, Botan

Metadata

Show full item record

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Item Usage Stats

405
views

179
downloads

Recent advances in supervised deep learning methods have shown great successes in image classification but these methods are known to owe their success to massive amount of data with reliable labels. However, constructing large-scale datasets inevitably results with varying levels of label noise which degrades performance of the supervised deep learning based classifiers. In this thesis, we make an analysis of sample selection based label noise robust approaches by providing extensive experimental evaluation. First, adverse effects of memorization of the noisy samples are investigated over results of a base model. Second, importance of knowledge of noise rate is analyzed for approaches utilizing a prior about noise rate. Third, superiority of recent semi-supervised based robust approaches over supervised ones is proved. Additionally, synthetically corrupted controlled datasets are used to show effects of the noise rate over training performance. Finally, a new framework is proposed to classify samples as clean or noisy by investigating train loss dynamics. To avoid heavily tuned parameters during clean sample detection, proposed framework artificially corrupts a noisy dataset and utilizes these artificially corrupted samples in a clean/noisy voting process. Moreover, following recent semi-supervised learning based label noise robust methods, framework applies semi-supervised and contrastive learning after classification of samples as clean-noisy. Also, effect of the co-training approach during semi-supervised learning is investigated and its effectiveness is proved.

Subject Keywords

Noisy labelled classification dataset, Clean labelled sample extraction, Classifier neural networks, Deep learning, Semi-supervised learning, Contrastive learning, Co-training

URI

https://hdl.handle.net/11511/98771

Collections

Graduate School of Natural and Applied Sciences, Thesis

Suggestions

OpenMETU
Core

An analysis of stereo depth estimation utilizing attention mechanisms, self-supervised pose estimators & temporal predictions Oğuzman, Utku; Alatan, Abdullah Aydın; Department of Electrical and Electronics Engineering (2022-5-18) By the recent success of deep learning, real-world applications of stereo depth estimation algorithms attracted the interest of many researchers. Using the available datasets, synthetic or real-world, the researchers begin analyzing their ideas for practical applications. In this thesis, a thorough analysis is performed of such an aim. The state-of-the-art stereo depth estimation algorithms are tried to be improved by incorporating attention mechanisms to the current networks and better initialization strat...
Visual Object Tracking with Autoencoder Representations Besbinar, Beril; Alatan, Abdullah Aydın (2016-05-19) Deep learning is the discipline of training computational models that are composed of multiple layers and these methods have recently improved the state of the art in many areas as a virtue of large labeled datasets, increase in the computational power of current hardware and unsupervised training methods. Although such a dataset may not be available for lots of application areas, the representations obtained by the well-designed networks that have a large representation capacity and trained with enough dat...
EXTRACTION OF INTERPRETABLE DECISION RULES FROM BLACK-BOX MODELS FOR CLASSIFICATION TASKS GALATALI, EGEMEN BERK; ALEMDAR, HANDE; Department of Computer Engineering (2022-8-31) In this work, we have proposed a new method and ready to use workflow to extract simplified rule sets for a given Machine Learning (ML) model trained on a classifi- cation task. Those rules are both human readable and in the form of software code pieces thanks to the syntax of Python programming language. We have inspired from the power of Shapley Values as our source of truth to select most prominent features for our rule sets. The aim of this work to select the key interval points in given data in order t...
Multi-task Deep Neural Networks in Protein Function Prediction Rifaioğlu, Ahmet Süreyya; Doğan, Tunca; Martin, Maria Jesus; Atalay, Rengül; Atalay, Mehmet Volkan (2017-05-01) In recent years, deep learning algorithms have outperformed the state-of-the art methods in several areas thanks to the efficient methods for training and for preventing overfitting, advancement in computer hardware, the availability of vast amount data. The high performance of multi-task deep neural networks in drug discovery has attracted the attention to deep learning algorithms in bioinformatics area. Here, we proposed a hierarchical multi-task deep neural network architecture based on Gene Ontology (GO...
MetaLabelNet: Learning to Generate Soft-Labels From Noisy-Labels Algan, Gorkem; Ulusoy, İlkay (2022-01-01) Real-world datasets commonly have noisy labels, which negatively affects the performance of deep neural networks (DNNs). In order to address this problem, we propose a label noise robust learning algorithm, in which the base classifier is trained on soft-labels that are produced according to a meta-objective. In each iteration, before conventional training, the meta-training loop updates soft-labels so that resulting gradients updates on the base classifier would yield minimum loss on meta-data. Soft-labels...

Citation Formats

B. Yıldırım, “Detection of clean samples in noisy labelled datasets via analysis of artificially corrupted samples,” M.S. - Master of Science, Middle East Technical University, 2022.