Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
Local Masking Meets Progressive Freezing: Crafting Efficient Vision Transformers for Self-Supervised Learning
Date
2025-01-01
Author
Topçuoğlu, Utku Mert
Akagündüz, Erdem
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
57
views
0
downloads
Cite This
This paper presents an innovative approach to self-supervised learning for Vision Transformers (ViTs), integrating local masked image modeling with progressive layer freezing. This method enhances the efficiency and speed of initial layer training in ViTs. By systematically freezing specific layers at strategic points during training, we reduce computational demands while maintaining learning capabilities. Our approach employs a novel multi-scale reconstruction process that fosters efficient learning in initial layers and enhances semantic comprehension across scales. The results demonstrate a substantial reduction in training time (12.5%) with a minimal impact on model accuracy (decrease in top-1 accuracy by 0.6%). Our method achieves top-1 and top-5 accuracies of 82.6% and 96.2%, respectively, underscoring its potential in scenarios where computational resources and time are critical. The implementation of our approach is available at our project's GitHub repository: https://github.com/utkutpcgl/ViTFreeze.
Subject Keywords
Computer Vision
,
Deep Learning
,
Masked Image Modeling
,
Self-Supervised Learning
,
Training Efficiency
URI
https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=105000170796&origin=inward
https://hdl.handle.net/11511/114317
DOI
https://doi.org/10.1117/12.3055190
Conference Name
17th International Conference on Machine Vision, ICMV 2024
Collections
Graduate School of Informatics, Conference / Seminar
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
U. M. Topçuoğlu and E. Akagündüz, “Local Masking Meets Progressive Freezing: Crafting Efficient Vision Transformers for Self-Supervised Learning,” Edinburgh, İngiltere, 2025, vol. 13517, Accessed: 00, 2025. [Online]. Available: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=105000170796&origin=inward.