Refurbished embeddings with integrated guidance networks for efficient context-length scaling

2025-8-27
Çavuşoğlu, Ali Devrim Ekin
This thesis presents REIGN (Refurbished Embeddings with Integrated Guidance Networks), a framework for efficient representation learning of very long documents. Unlike Transformer-based encoders constrained by token length, REIGN adopts a hierarchical approach: a frozen Guidance Network (GN) first produces fixed-size chunk embeddings, which are then aggregated by a lightweight encoder trained with a SimCLR-style contrastive objective. This decoupled design enables semantic understanding of documents spanning hundreds of thousands of tokens without backpropagating through token-level models. On the GoodWiki-Long benchmark, REIGN improves retrieval quality by +3 percentage points (pp) in nDCG@10 over truncated GN baselines, while also offering significant computational savings. A caching mechanism further accelerates training by reusing GN embeddings, making REIGN highly effective for scalable document retrieval under resource constraints.
Citation Formats
A. D. E. Çavuşoğlu, “Refurbished embeddings with integrated guidance networks for efficient context-length scaling,” M.S. - Master of Science, Middle East Technical University, 2025.