Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
EFFICIENT PRETRAINING OF VISION TRANSFORMERS: A LAYER-FREEZING APPROACH WITH LOCAL MASKED IMAGE MODELING
Download
UTKU_MMI_ODTU_TEZ.pdf
Date
2024-9-03
Author
Topçuoğlu, Utku Mert
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
31
views
14
downloads
Cite This
This thesis explores the acceleration of pre-training Vision Transformers (ViTs) for self-supervised learning by integrating progressive layer freezing with local masked image modeling. The study aims to address the significant computational demands and lengthy training times inherent in training ViTs when employing self-supervised methods like masked image modeling. The core contribution of this research lies in integrating the FreezeOut method into the LocalMIM architecture, enhancing training efficiency by systematically freezing specific layers at strategic points during training. We evaluate whether the FreezeOut method is as effective as proposed in the original paper across different optimizers, acknowledging that learning rate scheduling is optimizer-dependent. Our experimental results demonstrate that the proposed approach can reduce training time by approximately 12.5% with a minimal drop in top-1 accuracy (0.6%). Furthermore, we introduce and validate a novel learning rate scheduling method tailored for ViTs, which achieves an even more negligible accuracy drop of 0.1% with an 83.1% top-1 accuracy. We demonstrate that the number of training epochs and dataset complexity are critical factors for the effectiveness of the FreezeOut method and show that it performs even better with longer training epochs or simpler datasets. Our specially designed learning rate scheduling method showed greater robustness to fewer training epochs and more complex datasets, explaining its superior results in the 100 epoch IN-1K training setup. This research offers a solution for enhancing the efficiency of ViT pre-training, making self-supervised learning more accessible in environments with constrained computational resources. The findings contribute to the broader field of computer vision by highlighting the potential of progressive layer freezing and adaptive learning rate scheduling in optimizing training processes for ViTs. The implementation of our approach is accessible here: https://github.com/utkutpcgl/ViTFreeze.
Subject Keywords
Vision Transformers
,
Self-Supervised Learning
,
Local Masked Image Modeling
,
Progres- sive Layer Freezing
,
Computational Efficiency
,
Multi-Scale Reconstruction
,
Adaptive Learning Rate Scheduling
,
FreezeOut Learning Rate Scheduling
,
Training Time Reduction
URI
https://hdl.handle.net/11511/111029
Collections
Graduate School of Informatics, Thesis
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
U. M. Topçuoğlu, “EFFICIENT PRETRAINING OF VISION TRANSFORMERS: A LAYER-FREEZING APPROACH WITH LOCAL MASKED IMAGE MODELING,” M.S. - Master of Science, Middle East Technical University, 2024.