IMPROVING VISION LLM PERFORMANCE ON STANDARDIZED TEST QUESTIONS

2025-8-29
Sert, Sefik Egemen
In our research, we show that open-source vision-language models can be trained to rival proprietary systems on complex, multimodal Turkish high-school exam ques- tions — a domain where no benchmark previously existed. This thesis introduces the first standardized benchmark for evaluating Vision Language Models (VLMs) on the Turkish high-school curriculum. We present a manually curated dataset of 1,854 five-choice Yüksekög ̆retim Kurumları Sınavı (YKS) questions, evenly sampled from 309 topics, designed to comprehensively test multimodal reasoning over complex, exam-style problems. We establish performance baselines by evaluating both open- source and proprietary VLMs, revealing a 23-point accuracy gap between the best proprietary model (Gemini 2.5 Flash, 84.68%) and the strongest open-source model (Qwen-2.5VL-32B, 62.46%). To close this gap, we curated three large-scale multimodal training datasets (D, M, L) totaling 161.4 million tokens, augmented with solutions from advanced models (Gemini 2.0, Gemini 2.5) and video-assisted prompting for complex questions. Using our optimized QMSA (Question–Metadata–Solution–Answer) syntax, we fine-tuned Qwen-2.5VL-32B, achieving 78.59% accuracy — a 25.82% relative improvement —narrowing the gap to proprietary performance to 7.9%. This work delivers three contributions: (1) a publicly available benchmark for Turkish academic evaluation of VLMs, (2) a high-quality, domain-specific training dataset enabling competitive open-source performance, and (3) an empirical demonstration that data-centric fine-tuning can substantially close the open–proprietary performance gap. We also outline key challenges, such as spatial reasoning and domain-specific diagram interpretation, and propose targeted post-training, tool-assisted reasoning, and synthetic data generation as promising future directions.
Citation Formats
S. E. Sert, “IMPROVING VISION LLM PERFORMANCE ON STANDARDIZED TEST QUESTIONS,” M.S. - Master of Science, Middle East Technical University, 2025.