Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
LLM-Generated Rewrite and Context Modulation for Enhanced Vision Language Models in Digital Pathology
Date
2025-01-01
Author
Bahadir, Cagla Deniz
Akar, Gözde
Sabuncu, Mert R.
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
46
views
0
downloads
Cite This
Recent advancements in vision-language models (VLMs) have found important applications in medical imaging, particularly in digital pathology. VLMs demand large-scale datasets of image-caption pairs, which is often hard to obtain in medical domains. State-of-the-art VLMs in digital pathology have been pre-trained on datasets that are significantly smaller than their computer vision counterparts. Furthermore, the caption of a pathology slide often refers to a small sub-set of features in the image-an important point that is ignored in existing VLM pre-training schemes. Another important issue that is under-appericated is that the performance of state-of-the-art VLMs in zero-shot classification tasks can be sensitive to the choice of the prompts. In this paper, we first employ language rewrites using a large language model (LLM) to enrich a public pathology image-caption dataset and make it publicly available. Our extensive experiments demonstrate that by training with language rewrites, we can boost the performance of a state-of-the-art digital pathology VLM on downstream tasks such as zero-shot classification, and text-to-image and image-to-text retrieval. We further leverage LLMs to demonstrate the sensitivity of zero-shot classification results to the choice of prompts and propose a scalable approach to characterize this when comparing models. Finally, we present a novel context modulation layer that adjusts the image embeddings for better aligning with the paired text and use context-specific language rewrites for training this layer. In our results, we show that the proposed context modulation framework can further yield substantial performance gains.
Subject Keywords
digital pathology
,
large language models
,
vision language models
URI
https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=105003640049&origin=inward
https://hdl.handle.net/11511/114832
DOI
https://doi.org/10.1109/wacv61041.2025.00042
Conference Name
2025 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2025
Collections
Department of Electrical and Electronics Engineering, Conference / Seminar
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
C. D. Bahadir, G. Akar, and M. R. Sabuncu, “LLM-Generated Rewrite and Context Modulation for Enhanced Vision Language Models in Digital Pathology,” presented at the 2025 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2025, Arizona, Amerika Birleşik Devletleri, 2025, Accessed: 00, 2025. [Online]. Available: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=105003640049&origin=inward.