Cross-modal semi-dense image matching

2024-9
Tuzcuoğlu, Önder
In this thesis, we introduce a novel deep learning-based image matching method for cross-modal, cross-view local feature matching between thermal infrared (TIR) and visible-band images. Unlike visible-band images, TIR images are less susceptible to adverse lighting and weather conditions but present difficulties in matching due to significant texture and nonlinear intensity differences. Current hand-crafted and learning-based methods for visible-TIR matching fall short in handling viewpoint, scale, and texture diversities. Additionally, there is a lack of datasets with sufficient ground truth to train deep learning models against the intensity and texture characteristics of thermal and visible images. To address these problems, our method incorporates a two-stage training approach consisting of masked image modeling pre-training with real thermal and visible images, and fine-tuning with pseudo-thermal image augmentation to handle the modality differences between thermal and visible images. Furthermore, we introduce a refined matching pipeline that adjusts for scale discrepancies and enhances match reliability through sub-pixel level refinement. To validate the visible-TIR image matching capabilities of our method at different viewpoints and scales, as well as its robustness to different weather conditions, we collected a novel comprehensive visible-thermal dataset involving six diverse scenes in both cloudy and sunny weather conditions. Finally, we show that our method outperforms existing image matching methods on multiple benchmarks.
Citation Formats
Ö. Tuzcuoğlu, “Cross-modal semi-dense image matching,” M.S. - Master of Science, Middle East Technical University, 2024.