Achieving Representativeness Through the Parameters of Spoken Language and Discursive Features The Case of the Spoken Turkish Corpus

2010-05-15
Ruhı, Şükriye
Işık Güler, Hale
Hatipoğlu, Çiler
Eröz Tuğa, Betil
Çokal Karadaş, Derya
In this paper we overview the ongoing debate on achieving representativeness in general spoken corpora with the purpose of proposing a model for spoken corpora design and construction workflows. The proposal is illustrated in the context of an ongoing implementation for the Spoken Turkish Corpus, a corpus that will consist of one million words of present-day Turkish spoken in Turkey in its initial stage. The paper proposes a cyclic workflow and design scheme that is based on the principles of an “agile” corpus design and annotation system (Voorman and Gut, 2008), and argues that a three-pronged set of feature criteria, namely, demographic, contextual, and discursive features can be fruitfully combined to monitor and achieve representativeness. The paper discusses the underlying principles in the design scheme and outlines the metadata features of the web-based corpus management system, which utilizes and complements EXMARaLDA tools (Schmidt, 2004) in corpus construction and monitoring
Citation Formats
Ş. Ruhı, H. Işık Güler, Ç. Hatipoğlu, B. Eröz Tuğa, and D. Çokal Karadaş, “Achieving Representativeness Through the Parameters of Spoken Language and Discursive Features The Case of the Spoken Turkish Corpus,” Valletta, Malta, 13 - 15 May 2010, 2010, vol. 2, p. 789, Accessed: 00, 2021. [Online]. Available: http://www.aelinco.es/sites/default/files/part_ii_l-z_language_windowing_through_corpora_5-2010.pdf.