Evaluation of voice activity and voicing detection

Kotnik, Bojan
Sendorek, Pierre
Astrov, Sergey
Doco Fernndez, Laura
Banga, Eduardo Rodrguez
Höge, Harald
Kacic, Zdravko
Koç, Turgay
Çiloğlu, Tolga
This paper describes the ECESS evaluation campaign of voice activity and voicing detection. Standard VAD classifies signal into speech and non-speech, we extend it to VAD+ so that it classifies a signal as a sequence of non-speech, voiced and unvoiced segments. The evaluation is performed on a portion of the Spanish SPEECON database with manually labeled segmentation. To avoid errors caused by the limited precision of manual labeling we introduce "dead zones" -tolerance intervals ±5 ms around label changes in the data set. In these tolerance intervals we don't evaluate the signal.
INTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association


