Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
SAMChat: Introducing Chain of Thought Reasoning and GRPO to a Multimodal Small Language Model for Small Scale Remote Sensing
Download
12.pdf
Date
2025-01-01
Author
Köksal, Aybora
Alatan, Abdullah Aydın
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
56
views
29
downloads
Cite This
Remarkable capabilities in understanding and generating text-image content have been demonstrated by recent advancements in multimodal large language models (MLLMs). However, their effectiveness in specialized domains-particularly those requiring resource-efficient and domain-specific adaptations- has remained limited. In this work, a lightweight multimodal language model termed SAMChat is introduced, specifically adapted to analyze remote sensing imagery in secluded areas, including challenging missile launch sites. A new dataset, SAMData, was compiled by verifying hundreds of aerial images through expert review, and subtle military installations were highlighted via detailed captions. Supervised fine-tuning on a 2B-parameter open-source MLLM with chain-of-thought (CoT) reasoning annotations was performed, enabling more accurate and interpretable explanations. Additionally, Group Relative Policy Optimization (GRPO) was leveraged to enhance the model's ability to detect critical domain-specific cues-such as defensive layouts and key military structures-while minimizing false positives on civilian scenes. Through empirical evaluations, it has been shown that SAMChat significantly outperforms both larger, general-purpose multimodal models and existing remote sensing-adapted approaches on open-ended captioning and classification metrics. Over 80% recall and 98% precision were achieved on the newly proposed SAMData benchmark, underscoring the potency of targeted fine-tuning and reinforcement learning in specialized real-world applications. Code and dataset will be available upon acceptance.
Subject Keywords
Aerial image analysis
,
chain-of-thought reasoning
,
domain adaptation
,
group relative policy optimization
,
multimodal large language models
,
remote sensing
URI
https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=105023136351&origin=inward
https://hdl.handle.net/11511/117688
Journal
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
DOI
https://doi.org/10.1109/jstars.2025.3637115
Collections
Department of Electrical and Electronics Engineering, Article
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
A. Köksal and A. A. Alatan, “SAMChat: Introducing Chain of Thought Reasoning and GRPO to a Multimodal Small Language Model for Small Scale Remote Sensing,”
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
, pp. 0–0, 2025, Accessed: 00, 2025. [Online]. Available: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=105023136351&origin=inward.