Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
Transformer Based Sensor Fusion and Pose Estimation in End-to-End Supervised Learning of Visual Inertial Odometry
Download
index.pdf
Date
2024-9
Author
Kurt, Yunus Bilge
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
386
views
1596
downloads
Cite This
This thesis investigates the application of Transformer architecture for temporal modeling in visual-inertial odometry (VIO) networks. The objective is to improve pose estimation accuracy by leveraging the attention mechanisms in Transformers, which better utilize historical data compared to Long Short Term Memory (LSTM) networks seen in recent methods. The proposed method is end-to-end trainable and requires only monocular camera and IMU measurements during inference. We observe that latent visual-inertial features contain essential information for pose estimation, enabling Transformers to perform effective temporal updates from past measurements within a local window. To facilitate real-time deployment, all attention mechanisms are designed to work with causal masks. This thesis also explores the use of tokenization mechanisms for continuous data in time series prediction problems, and evaluates regression by classification in odometry task. The study examines the impact of data uncertainty in supervised end-to-end odometry learning and considers specialized loss functions for the pose space. Experimental results demonstrate that Transformer-based architectures enhance the accuracy of monocular VIO networks, achieving better or comparable results compared to state-of-the-art methods on standard odometry datasets.
Subject Keywords
Visual inertial odometry
,
End-to-end odometry learning
,
Transformer
URI
https://hdl.handle.net/11511/111293
Collections
Graduate School of Natural and Applied Sciences, Thesis
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
Y. B. Kurt, “Transformer Based Sensor Fusion and Pose Estimation in End-to-End Supervised Learning of Visual Inertial Odometry,” M.S. - Master of Science, Middle East Technical University, 2024.