Multilingual Extension of PDTB-Style Annotation: The Case of TED Multilingual Discourse Bank

Date

2018-05-07

Author

Zeyrek Bozşahin, Deniz
Kurfalı, Murathan

Metadata

Show full item record

Item Usage Stats

196
views

0
downloads

We introduce TED-Multilingual Discourse Bank, a corpus of TED talks transcripts in 6 languages (English, German, Polish, European Portuguese, Russian and Turkish), where the ultimate aim is to provide a clearly described level of discourse structure and semantics in multiple languages. The corpus is manually annotated following the goals and principles of PDTB, involving explicit and implicit discourse connectives, entity relations, alternative lexicalizations and no relations. In the corpus, we also aim to capture the characteristics of spoken language that exist in the transcripts and adapt the PDTB scheme according to our aims; for example, we introduce hypophora. We spot other aspects of spoken discourse such as the discourse marker use of connectives to keep them distinct from their discourse connective use. TED-MDB is, to the best of our knowledge, one of the few multilingual discourse treebanks and is hoped to be a source of parallel data for contrastive linguistic analysis as well as language technology applications. We describe the corpus, the annotation procedure and provide preliminary corpus statistics. © LREC 2018 - 11th International Conference on Language Resources and Evaluation. All rights reserved.

Subject Keywords

Discourse markers, Language technology

URI

https://hdl.handle.net/11511/86184
https://www.scopus.com/record/display.uri?eid=2-s2.0-85059897073&origin=resultslist&sort=plf-f&src=s&st1=&st2=&sid=879e879987b78a5c9ff576ff6384bb28&sot=b&sdt=b&sl=108&s=TITLE-ABS-KEY+%28Multilingual+Extension+of+PDTB-Style+Annotation%3a+The+Case+of+TED+Multilingual+Discourse+Bank%29&relpos=0&citeCnt=3&searchTerm=

Conference Name

LREC 2018 - 11th International Conference on Language Resources and Evaluation 2019

Collections

Graduate School of Informatics, Conference / Seminar

Suggestions

OpenMETU
Core

TED Multilingual Discourse Bank (TED-MDB): a parallel corpus annotated in the PDTB style Zeyrek Bozşahin, Deniz; Grishina, Yulia; Kurfalı, Murathan; Gibbon, Samuel; Ogrodniczuk, Maciej (2020-06-01) TED-Multilingual Discourse Bank, or TED-MDB, is a multilingual resource where TED-talks are annotated at the discourse level in 6 languages (English, Polish, German, Russian, European Portuguese, and Turkish) following the aims and principles of PDTB. We explain the corpus design criteria, which has three main features: the linguistic characteristics of the languages involved, the interactive nature of TED talks-which led us to annotate Hypophora, and the decision to avoid projection. We report our annotati...
The Corpus of Turkish Youth Language (COTY): The compilation and interactional dynamics of a spoken corpus Efeoğlu Özcan, Esranur; Işık Güler, Hale; English Language Teaching (2022-9-2) This study examines the previously unattained research area of contemporary spoken Turkish used in dyadic and multi-party interaction among young speakers of Turkish. For this purpose, a specialized corpus called the Corpus of Turkish Youth Language (CoTY) was compiled as a source of data and as a tool of analysis. Designed to offer a maximally representative sample of Turkish youth talk, the CoTY contains naturally occurring and spontaneous interactional data among young people between the ages of 14-18 fr...
Virtual exchange in teacher education: focus on L2 writing Hilliker, Shannon M.; Yol, Özge (2022-01-01) Virtual exchange, as a global social practice, provides language teacher candidates with opportunities to connect their global peers and language learners. Integrated into teacher education, virtual exchange practices enhance teacher candidates' professional and pedagogical knowledge and practices. This study details a virtual exchange program between TESOL teacher candidates in Poland and the US and examines the transformations in teacher candidates' understanding and perspectives of theories and pedagogie...
Language learning strategies and self-efficacy beliefs as predictors of english proficiency in a language preparatory school Açıkel, Merih; Çapa Aydın, Yeşim; Department of Educational Sciences (2011) The purpose of this study was to examine the relationship of language learning strategy use and self-efficacy beliefs with language proficiency of the language preparatory school students. Moreover, some demographic characteristics of the participants were analyzed in relation to the proficiency scores of the students. Four hundred eighty nine language preparatory school students from one private university in Ankara were included in the study. Turkish version of Inventory of Strategies for Language Learnin...
The Discourse structure of Turkish Demirşahin, Işın; Zeyrek Bozşahin, Deniz; Department of Cognitive Sciences (2015) This thesis investigates the structure of immediate discourse in Turkish. The first and fore- most question is how discourse is built. Are there components of discourse that constitute a predicate-argument structure, or is discourse realized by underlying non-structural ties that are merely made explicit by these components? If there is structure in discourse, what is the nature of this structure, and what is its complexity? For this purpose, we analyze the relations annotated in the Turkish Discourse Bank,...

Citation Formats

D. Zeyrek Bozşahin and M. Kurfalı, “Multilingual Extension of PDTB-Style Annotation: The Case of TED Multilingual Discourse Bank,” CenterMiyazaki; Japan, 2018, p. 1913, Accessed: 00, 2021. [Online]. Available: https://hdl.handle.net/11511/86184.