Audio event detection on tv broadcast

Ozan, Ezgi Can
The availability of digital media has grown tremendously with the fast-paced ever-growing storage and communication technologies. As a result, today, we are facing a problem in indexing and browsing the huge amounts of multimedia data. This amount of data is impossible to be indexed or browsed by hand so automatic indexing and browsing systems are proposed. Audio Event Detection is a research area which tries to analyse the audio data in a semantic and perceptual manner, to bring a conceptual solution to this problem. In this thesis, a method for detecting several audio events in TV broadcast is proposed. The proposed method includes an audio segmentation stage to detect event boundaries. Broadcast audio is classified into 17 classes. The feature set for each event is obtained by using a feature selection algorithm to select suitable features among a large set of popular descriptors. Support Vector Machines and Gaussian Mixture Models are used as classifiers and the proposed system achieved an average recall rate of 88% for 17 different audio events. Comparing with the results in the literature, the proposed method is promising.