Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
Self-supervised learning for unsupervised image classification and supervised localization tasks
Download
Melih_Baydar_PhD_thesis.pdf
Date
2024-7
Author
Baydar, Melih
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
58
views
182
downloads
Cite This
Recent self-supervised learning methods, where instance discrimination is a fundamental pretraining task for convolutional neural networks (CNNs), excel in transfer learning. While instance discrimination is effective for classification due to its image-level learning, it lacks dense representation learning, making it sub-optimal for localization tasks like object detection. In the first part of this thesis, we aim to mitigate this shortcoming of instance discrimination task by extending it to learn dense representations alongside image-level representations. By adding a segmentation branch parallel to image-level learning to predict class-agnostic masks, we enhance the location-awareness of the representations. Our approach improves performance in localization tasks, achieving up to 1.7% AP improvement on PASCAL VOC, 0.8% AP on COCO object detection, 0.8% AP on COCO instance segmentation, and 3.6% mIoU on PASCAL VOC semantic segmentation. In recent years, Vision Transformers (ViTs) have significantly advanced deep learning models, boosting performance in traditional computer vision tasks and driving substantial progress in self-supervised learning. In the second part of this thesis, we also proposes UCLS, an unsupervised image classification framework leveraging the improved feature representation and superior nearest neighbor performance of self-supervised ViTs. We incrementally enhance baseline methods for unsupervised image classification and further propose the use of a cluster ensembling methodology and a self-training step to optimize the utilization of multi-head classifiers. Extensive experimentation demonstrates that UCLS achieves state-of-the-art performance on ten image classification benchmarks in fully unsupervised settings, with 99.3% clustering accuracy on CIFAR10, 89% on CIFAR100, and surpassing 70% on ImageNet in an unsupervised context.
Subject Keywords
Self-supervised learning
,
Contrastive learning
,
Non-contrastive learning
,
Instance discrimination task
,
Semantic segmentation
,
Object detection
,
Unsupervised image classification
,
Deep clustering
URI
https://hdl.handle.net/11511/110517
Collections
Graduate School of Natural and Applied Sciences, Thesis
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
M. Baydar, “Self-supervised learning for unsupervised image classification and supervised localization tasks,” Ph.D. - Doctoral Program, Middle East Technical University, 2024.