Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
Refurbished embeddings with integrated guidance networks for efficient context-length scaling
Download
METU_Thesis_Devrim_Çavuşoğlu.pdf
ceng-a.d.e.cavusoglu.pdf
Date
2025-8-27
Author
Çavuşoğlu, Ali Devrim Ekin
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
981
views
0
downloads
Cite This
This thesis presents REIGN (Refurbished Embeddings with Integrated Guidance Networks), a framework for efficient representation learning of very long documents. Unlike Transformer-based encoders constrained by token length, REIGN adopts a hierarchical approach: a frozen Guidance Network (GN) first produces fixed-size chunk embeddings, which are then aggregated by a lightweight encoder trained with a SimCLR-style contrastive objective. This decoupled design enables semantic understanding of documents spanning hundreds of thousands of tokens without backpropagating through token-level models. On the GoodWiki-Long benchmark, REIGN improves retrieval quality by +3 percentage points (pp) in nDCG@10 over truncated GN baselines, while also offering significant computational savings. A caching mechanism further accelerates training by reusing GN embeddings, making REIGN highly effective for scalable document retrieval under resource constraints.
Subject Keywords
Long Document Embeddings
,
Contrastive Learning
,
Hierarchical Encoding
,
Guidance Networks
,
Document Retrieval
,
Large Context NLP
,
Token-Free Training
,
HDF5 Caching
URI
https://hdl.handle.net/11511/116043
Collections
Graduate School of Natural and Applied Sciences, Thesis
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
A. D. E. Çavuşoğlu, “Refurbished embeddings with integrated guidance networks for efficient context-length scaling,” M.S. - Master of Science, Middle East Technical University, 2025.