Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
SPEECH DETECTION ON BROADCAST AUDIO
Date
2010-08-27
Author
Zubari, Unal
Ozan, Ezgi Can
Acar, Banu Oskay
Çiloğlu, Tolga
Esen, Ersin
Ates, Tugrul K.
Onur, Duygu Oskay
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
173
views
0
downloads
Cite This
Speech boundary detection contributes to performance of speech based applications such as speech recognition and speaker recognition. Speech boundary detector implemented in this study works on broadcast audio as a pre-processor module of a keyword spotter. Speech boundary detection is handled in 3 steps. At first step, audio data is segmented into homogeneous regions in an unsupervised manner. After an ACTIVITY/NON-ACTIVITY decision is made for each region, ACTIVITY regions are classified as Speech/Non-speech via Gaussian Mixture Model (GMM) based classification. GMM's are trained using a novel feature, Spectral Flow Direction (SFD), and an improved multi-band harmonicity feature in addition to widely used Mel Frequency Cepstral Coefficients (MFCC's).
Subject Keywords
Classificitation
,
Music
URI
https://hdl.handle.net/11511/53077
Conference Name
18th European Signal Processing Conference (EUSIPCO)
Collections
Department of Electrical and Electronics Engineering, Conference / Seminar
Suggestions
OpenMETU
Core
Sign language recognition by image analysis /
Büyüksaraç, Buket; Bulut, Mehmet Mete; Akar, Gözde; Department of Electrical and Electronics Engineering (2015)
The Sign Language Recognition (SLR) Problem is a highly important research topic, because of its ability to increase the interaction between the people who are hearing-impaired or impediment in speech. However there are several limitations of the existing methods. Most applications need different necessities like making the user wear multi-colored or sensor based gloves or usage of a specific camera. We propose a simple but robust system that can be used without the need of any specific accessories. The pro...
Bimodal automatic speech segmentation and boundary refinement techniques
Akdemir, Eren; Çiloğlu, Tolga; Department of Electrical and Electronics Engineering (2010)
Automatic segmentation of speech is compulsory for building large speech databases to be used in speech processing applications. This study proposes a bimodal automatic speech segmentation system that uses either articulator motion information (AMI) or visual information obtained by a camera in collaboration with auditory information. The presence of visual modality is shown to be very beneficial in speech recognition applications, improving the performance and noise robustness of those systems. In this dis...
Optimizing core signal processing functions on a superscalar SIMD architecture
Uslu, Çağrı; Bazlamaçcı, Cüneyt Fehmi; Department of Electrical and Electronics Engineering (2019)
Digital Signal Processing (DSP) is the basis of many technologies, such as Image Processing, Speech Recognition, Radars, etc. Use of electronic devices such as smart- phones, smartwatches, self-driving cars and autonomous robots that take advantage of these technologies becomes widespread and hence it is more critical than ever for these technologies to be realized with high efficiency on cheaper and less power- hungry devices. Cortex-A15 processor architecture is one of the solutions from ARM to this requi...
Optical flow based video frame segmentation and segment classification
Akpınar, Samet; Alpaslan, Ferda Nur; Department of Computer Engineering (2018)
Video information retrieval is a field of multimedia research enabling us to extract desired semantic information from video data. In content-based video information retrieval, visual content obtained from video scenes is utilized. For developing methods to cope with content-based video information retrieval in terms of temporal concepts such as action, event, etc., representation of temporal information becomes critical. In this thesis, action detection is tackled based on a temporal video representation m...
Speech recognition on mobile devices in noisy environments
Yurtcan, Yaser; Günel Kılıç, Banu (2018-05-05)
The use of speech recognition on mobile devices has been possible with the development of cloud systems and has been used for about 10 years. However, in noisy environments, the problem of speech recognition with low error rate still persists. In this study, different speech samples have been recorded using a compact microphone array in noisy environments and a data set has been created by processing them with a real-time noise cancellation algorithm. Speech recognition performance has been tested on the ge...
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
U. Zubari et al., “SPEECH DETECTION ON BROADCAST AUDIO,” presented at the 18th European Signal Processing Conference (EUSIPCO), Aalborg, DENMARK, 2010, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/53077.