A novel scalable global alignment method for 3D reconstruction

2025-4-07
Üstündaş, Şahin Umutcan
One of the core challenges in 3D Vision is the estimation of 3D scene geometry. Traditionally, this task was predominantly tackled with well-established and time-tested methods such as Structure-from-Motion, a pipeline of simpler algorithms where each algorithm handles a specific subtask. However, this makes the overall pipeline susceptible to errors and noise, which propagate to subsequent modules. Although recent work has improved the accuracy of such pipelines, the aforementioned problems persist. DUSt3R, a recent holistic method to address this issue, takes a pair of images as input and extracts information-rich structures called ``pointmaps''. These pointmaps can then be used in downstream tasks, such as camera parameter estimation, point matching, 3D reconstruction, and depth estimation. To handle multiple images, DUSt3R employs a global alignment method that processes the images in pairs and applies an optimization algorithm to place the pointmaps in a common coordinate frame. However, the proposed alignment method suffers from a quadratic computational complexity. In this thesis, we propose a novel, scalable global alignment method that reduces the original $O(N^2)$ complexity to a theoretical upper bound of $O(km^2)$, where $m \ll N$ is a predetermined batch size and $k=N/m$ is the number of batches. This relaxation of computational complexity can accelerate the adoption of DUSt3R-based methods as a modern general-purpose 3D Vision tool. Our results show that our method demonstrates a substantial decrease in memory and time complexity, consistent with our theoretical upper bound; additionally, Relative Translation Accuracy (RTA) and Relative Rotation Accuracy (RRA) metrics show that our method performs comparable to DUSt3R and its variants.
Citation Formats
Ş. U. Üstündaş, “A novel scalable global alignment method for 3D reconstruction,” M.S. - Master of Science, Middle East Technical University, 2025.