TED Multilingual Discourse Bank (TED-MDB): a parallel corpus annotated in the PDTB style

2020-06-01
Zeyrek Bozşahin, Deniz
Mendes, Amália
Grishina, Yulia
Kurfalı, Murathan
Gibbon, Samuel
Ogrodniczuk, Maciej
TED-Multilingual Discourse Bank, or TED-MDB, is a multilingual resource where TED-talks are annotated at the discourse level in 6 languages (English, Polish, German, Russian, European Portuguese, and Turkish) following the aims and principles of PDTB. We explain the corpus design criteria, which has three main features: the linguistic characteristics of the languages involved, the interactive nature of TED talks-which led us to annotate Hypophora, and the decision to avoid projection. We report our annotation consistency, and post-annotation alignment experiments, and provide a cross-lingual comparison based on corpus statistics.

Citation Formats
D. Zeyrek Bozşahin, A. Mendes, Y. Grishina, M. Kurfalı, S. Gibbon, and M. Ogrodniczuk, “TED Multilingual Discourse Bank (TED-MDB): a parallel corpus annotated in the PDTB style,” Language Resources and Evaluation, vol. 54, no. 2, pp. 587–613, 2020, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/31690.