Visual-inertial sensor fusion for 3D urban modeling

Sırtkaya, Salim
In this dissertation, a real-time, autonomous and geo-registered approach is presented to tackle the large scale 3D urban modeling problem using a camera and inertial sensors. The proposed approach exploits the special structures of urban areas and visual-inertial sensor fusion. The buildings in urban areas are assumed to have planar facades that are perpendicular to the local level. A sparse 3D point cloud of the imaged scene is obtained from visual feature matches using camera poses estimates, and planar patches are obtained by an iterative Hough Transform on the 2D projection of the sparse 3D point cloud in the direction of gravity. The result is a compact and dense depth map of the building facades in terms of planar patches. The plane extraction is performed on sequential frames and a complete model is obtained by plane fusion. Inertial sensor integration helps to improve camera pose estimation, 3D reconstruction and planar modeling stages. For camera pose estimation, the visual measurements are integrated with the inertial sensors by means of an indirect feedback Kalman filter. This integration helps to get reliable and geo-referenced camera pose estimates in the absence of GPS. The inertial sensors are also used to filter out spurious visual feature matches in the 3D reconstruction stage, find the direction of gravity in plane search stage, and eliminate out of scope objects from the model using elevation data. The visual-inertial sensor fusion and urban heuristics utilization are shown to outperform the classical approaches for large scale urban modeling in terms of consistency and real-time applicability.