SPEECH DETECTION ON BROADCAST AUDIO

2010-08-27
Zubari, Unal
Ozan, Ezgi Can
Acar, Banu Oskay
Çiloğlu, Tolga
Esen, Ersin
Ates, Tugrul K.
Onur, Duygu Oskay
Speech boundary detection contributes to performance of speech based applications such as speech recognition and speaker recognition. Speech boundary detector implemented in this study works on broadcast audio as a pre-processor module of a keyword spotter. Speech boundary detection is handled in 3 steps. At first step, audio data is segmented into homogeneous regions in an unsupervised manner. After an ACTIVITY/NON-ACTIVITY decision is made for each region, ACTIVITY regions are classified as Speech/Non-speech via Gaussian Mixture Model (GMM) based classification. GMM's are trained using a novel feature, Spectral Flow Direction (SFD), and an improved multi-band harmonicity feature in addition to widely used Mel Frequency Cepstral Coefficients (MFCC's).
18th European Signal Processing Conference (EUSIPCO)

Suggestions

Sign language recognition by image analysis /
Büyüksaraç, Buket; Bulut, Mehmet Mete; Akar, Gözde; Department of Electrical and Electronics Engineering (2015)
The Sign Language Recognition (SLR) Problem is a highly important research topic, because of its ability to increase the interaction between the people who are hearing-impaired or impediment in speech. However there are several limitations of the existing methods. Most applications need different necessities like making the user wear multi-colored or sensor based gloves or usage of a specific camera. We propose a simple but robust system that can be used without the need of any specific accessories. The pro...
Bimodal automatic speech segmentation and boundary refinement techniques
Akdemir, Eren; Çiloğlu, Tolga; Department of Electrical and Electronics Engineering (2010)
Automatic segmentation of speech is compulsory for building large speech databases to be used in speech processing applications. This study proposes a bimodal automatic speech segmentation system that uses either articulator motion information (AMI) or visual information obtained by a camera in collaboration with auditory information. The presence of visual modality is shown to be very beneficial in speech recognition applications, improving the performance and noise robustness of those systems. In this dis...
Optimizing core signal processing functions on a superscalar SIMD architecture
Uslu, Çağrı; Bazlamaçcı, Cüneyt Fehmi; Department of Electrical and Electronics Engineering (2019)
Digital Signal Processing (DSP) is the basis of many technologies, such as Image Processing, Speech Recognition, Radars, etc. Use of electronic devices such as smart- phones, smartwatches, self-driving cars and autonomous robots that take advantage of these technologies becomes widespread and hence it is more critical than ever for these technologies to be realized with high efficiency on cheaper and less power- hungry devices. Cortex-A15 processor architecture is one of the solutions from ARM to this requi...
Speech recognition on mobile devices in noisy environments
Yurtcan, Yaser; Günel Kılıç, Banu (2018-05-05)
The use of speech recognition on mobile devices has been possible with the development of cloud systems and has been used for about 10 years. However, in noisy environments, the problem of speech recognition with low error rate still persists. In this study, different speech samples have been recorded using a compact microphone array in noisy environments and a data set has been created by processing them with a real-time noise cancellation algorithm. Speech recognition performance has been tested on the ge...
Content-Based Classification and Segmentation of Mixed-Type Audio by Using MPEG-7 Features
Dogan, Ebru; SERT, MUSTAFA; Yazicit, Adnan (2009-07-25)
This paper describes the development of a generated solution for classification and segmentation of broadcast news audio, A sound stream is segmented by classifying each sub-segment into silence, pure speech, music, environmental sound, speech over music, and speech over environmental sound classes in multiple steps. Support Vector Machines and Hidden Markov Models are employed for classification and these models are trained by using different sets of MPEG-7 features. A series of tests was conducted on hand...
Citation Formats
U. Zubari et al., “SPEECH DETECTION ON BROADCAST AUDIO,” presented at the 18th European Signal Processing Conference (EUSIPCO), Aalborg, DENMARK, 2010, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/53077.