Event Boundary Detection Using Audio Visual Features and Web casting Texts with Imprecise Time Information

ALAN, Özgür
SAMET, Akpınar
ORKUNT, Sabuncu
Çiçekli, Fehime Nihan
Alpaslan, Ferda Nur
We propose a method to detect events and event boundaries in soccer videos by using web-casting texts and audio-visual features. The events and their inaccurate time information given in web-casting texts need to be aligned with the visual content of the video. We overcome this issue by utilizing textual, visual and audio features. Existing methods assume that the time at which the event occurs is given precisely (in seconds). However, most web-casting texts presented by popular organizations such as uefa.com (the official site of Union of European Football Associations) provide the time information in minutes rather than seconds. We propose a robust method which is able to handle uncertainties in the time points of the events. As a result of our experiments, we claim that our method detects event boundaries satisfactorily for uncertain web-casting texts, and that the use of audio-visual features improves the performance of event boundary detection.