Self-supervised learning for unsupervised image classification and supervised localization tasks

2024-7
Baydar, Melih
Recent self-supervised learning methods, where instance discrimination is a fundamental pretraining task for convolutional neural networks (CNNs), excel in transfer learning. While instance discrimination is effective for classification due to its image-level learning, it lacks dense representation learning, making it sub-optimal for localization tasks like object detection. In the first part of this thesis, we aim to mitigate this shortcoming of instance discrimination task by extending it to learn dense representations alongside image-level representations. By adding a segmentation branch parallel to image-level learning to predict class-agnostic masks, we enhance the location-awareness of the representations. Our approach improves performance in localization tasks, achieving up to 1.7% AP improvement on PASCAL VOC, 0.8% AP on COCO object detection, 0.8% AP on COCO instance segmentation, and 3.6% mIoU on PASCAL VOC semantic segmentation. In recent years, Vision Transformers (ViTs) have significantly advanced deep learning models, boosting performance in traditional computer vision tasks and driving substantial progress in self-supervised learning. In the second part of this thesis, we also proposes UCLS, an unsupervised image classification framework leveraging the improved feature representation and superior nearest neighbor performance of self-supervised ViTs. We incrementally enhance baseline methods for unsupervised image classification and further propose the use of a cluster ensembling methodology and a self-training step to optimize the utilization of multi-head classifiers. Extensive experimentation demonstrates that UCLS achieves state-of-the-art performance on ten image classification benchmarks in fully unsupervised settings, with 99.3% clustering accuracy on CIFAR10, 89% on CIFAR100, and surpassing 70% on ImageNet in an unsupervised context.
Citation Formats
M. Baydar, “Self-supervised learning for unsupervised image classification and supervised localization tasks,” Ph.D. - Doctoral Program, Middle East Technical University, 2024.