GRAPH-BASED HIERARCHICAL TRACKLET MERGE FOR MULTIPLE OBJECT TRACKING

2024-4-05
Bilgi, Halil Çağrı
The past decade has seen significant advancements in multi-object tracking, particularly with the rise of deep learning. However, many studies in online tracking have primarily focused on enhancing track management or extracting visual features, often leading to hybrid approaches with limited effectiveness, especially in scenarios with severe occlusions or crowded scenes. Conversely, in offline tracking, there has been a lack of emphasis on robust motion cues. This thesis proposes a novel solution to offline tracking by hierarchically merging tracklets, leveraging recent promising learning-based architectures. Our approach integrates motion cues and social interactions among targets using a joint Transformer and Graph Neural Network (GNN) encoder. The proposed solution is an end-to-end trainable model that does not require any handcrafted short-term or long-term matching processes. By representing tracklets across multiple frames using a graph structure, we enable collective reasoning of targets across different timestamps, leveraging advancements in graph representation learning. Furthermore, the Transformer encoder effectively captures the motion of each tracklet. By enabling bi-directional information propagation between these modalities, namely Transformer and GNN, we allow motion modeling to depend on interactions and, conversely, interaction modeling to depend on the motion of each target. Experimental results demonstrate the effectiveness of our approach, indicating that graph representation learning equipped with a joint Transformer encoder achieves results comparable to the state-of-the-art algorithms. These promising results emphasize the potential of the joint Transformer-GNN encoder architecture in multi-object tracking.
Citation Formats
H. Ç. Bilgi, “GRAPH-BASED HIERARCHICAL TRACKLET MERGE FOR MULTIPLE OBJECT TRACKING,” M.S. - Master of Science, Middle East Technical University, 2024.