Fine-tuning medical large language models for differential diagnosis: from synthetic data to real-world evaluation

2026-1
Çavaş, Ezgi
Access to large-scale, annotated Electronic Health Records (EHR) is limited by privacy rules. This creates a major setback for training strong clinical natural language processing (NLP) models. Synthetic data provides a way to protect privacy, but how well synthetic text works for fine-tuning large language models (LLMs) in real-world tasks is still an important issue to explore. Our thesis presents a framework that uses synthetic patient summaries to fine-tune a medical LLM model for multi-label disease diagnosis. This approach offers a cost-effective and privacy-focused method for creating clinical diagnostic tools with minimal use of sensitive real-world data. The results show that synthetic data can successfully reshape the medical models. This also helps the hospitals that are struggling with triage and the overcrowding of patients.
Citation Formats
E. Çavaş, “Fine-tuning medical large language models for differential diagnosis: from synthetic data to real-world evaluation,” M.S. - Master of Science, Middle East Technical University, 2026.