Metadata extraction from text in soccer domain

Göktürk, Özkan
Video databases and content based retrieval in these databases have become popular with the improvements in technology. Metadata extraction techniques are used for providing data to video content. One popular metadata extraction technique for mul- timedia is information extraction from text. For some domains, it is possible to nd accompanying text with the video, such as soccer domain, movie domain and news domain. In this thesis, we present an approach of metadata extraction from match reports for soccer domain. The UEFA Cup and UEFA Champions League Match Reports are downloaded from the web site of UEFA by a web-crawler. These match reports are preprocessed by using regular expressions and then important events are extracted by using hand-written rules. In addition to hand-written rules, two dierent machine learning techniques are applied on match corpus to learn event patterns and automatically extract match events. Extracted events are saved in an MPEG-7 le. A user interface is implemented to query the events in the MPEG-7 match corpus and view the corresponding video segments.


