Evaluating cross-lingual textual similarity on dictionary alignment problem

Date

2020-06-01

Author

Sever, Yiğit

Metadata

Show full item record

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Item Usage Stats

215
views

0
downloads

Bilingual or even polylingual word embeddings created many possibilities for tasks involving multiple languages. While some tasks like cross-lingual information retrieval aim to satisfy users' multilingual information needs, some enable transferring valuable information from resource-rich languages to resource-poor ones. In any case, it is important to build and evaluate methods that operate in a cross-lingual setting. In this paper, Wordnet definitions in 7 different languages are used to create a semantic textual similarity testbed to evaluate cross-lingual textual semantic similarity methods. A document alignment task is created to be used between Wordnet glosses of synsets in 7 different languages. Unsupervised textual similarity methods-Wasserstein distance, Sinkhorn distance and cosine similarity-are compared with a supervised Siamese deep learning model. The task is modeled both as a retrieval task and an alignment task to investigate the hubness of the semantic similarity functions. Our findings indicate that considering the problem as a retrieval and alignment problem has a detrimental effect on the results. Furthermore, we show that cross-lingual textual semantic similarity can be used as an automated Wordnet construction method.

Subject Keywords

Linguistics and Language, Education, Library and Information Sciences, Language and Linguistics, Cross-lingual textual semantic similarity, Word embeddings, Wasserstein distance, Sinkhorn distance, Siamese neural network

URI

https://hdl.handle.net/11511/35148

Journal

LANGUAGE RESOURCES AND EVALUATION

DOI

https://doi.org/10.1007/s10579-020-09498-1

Collections

Department of Computer Engineering, Article

Suggestions

OpenMETU
Core

Single exponent in l1 multiple exponents in l2: consequences for l2 Kurumlu, Zehra; Gracanın Yüksek, Martına; Department of English Language Teaching (2013) The thesis hypothesized that when an exponent of a linguistic concept in the native language maps onto several different exponents in the target language, learners have difficulty when acquiring those structures in the target language. By contrast, when an exponent of a linguistic concept in the native language and its counterpart in the target language stand in a one-to-one correspondence, the possibility of making errors decreases to a considerable extent. In order to test this hypothesis, I examined thre...
Basic to applied research: the benefits of audio-visual speech perception research in teaching foreign languages Erdener, Dogu (Informa UK Limited, 2016-01-01) Traditionally, second language (L2) instruction has emphasised auditory-based instruction methods. However, this approach is restrictive in the sense that speech perception by humans is not just an auditory phenomenon but a multimodal one, and specifically, a visual one as well. In the past decade, experimental studies have shown that the audio-visual aspects of speech perception have facilitative effects in L2 acquisition. This article has four theoretical and practical aims: (1) to synthesise the existing...
A note on language contact: Laz language in Turkey Akkuş, Mehmet (SAGE Publications, 2019-08-01) Classified as an endangered language, the Laz language is spoken in a restricted area by a small number of speakers. The contact between Turkish and Laz is intense and unidirectional in that the latter is only restrained to communication among family members in small speech communities. Contact-induced change, which is an inevitable outcome of Turkish-Laz contact, is investigated by placing special emphasis on loanwords. This paper, thus, addresses the contact between Turkish and the Laz language at lexical...
A hearer-based analysis of Turkish-Azerbaijani receptive multilingual communication Sağın Şimşek, Sultan Çiğdem (2019-08-01) Aims and Objectives/Purpose/Research Questions: Receptive multilingualism (RM), a mode of multilingual communication in which speakers of different languages use their own native language to communicate and still understand each other, is considered to be a relatively under-investigated area in multilingual research. This paper examines features of Turkish-Azerbaijani RM and, focusing on a hearer-based analysis, looks into strategies applied for achieving understanding in the receptive multilingual communic...
EXPLORING HOW RURAL SCHOOL STUDENTS IN TURKEY STUDY WITH EUROPEAN STUDENTS COLLABORATIVELY TO SUPPORT THEIR LANGUAGE LEARNING AND CULTURAL EXCHANGE: TECHNOLOGY AND GAME-BASED EUROPEAN PROJECT Baser, Derya; Baser, Serhat (2012-07-04) A language can be best learned by using that language in real communicative situations as one of the features of the communicative approach of language learning (Gulden & Tapan, 2003). Moreover, a game is one of the most appropriate strategies arranged for preschool children to create an interaction among students, to make them convey messages through languages and reflect their culture (Clarke, 2009). This is also valid for older children especially who are in the age of primary or elementary school. The s...

Citation Formats

Y. Sever, “Evaluating cross-lingual textual similarity on dictionary alignment problem,” LANGUAGE RESOURCES AND EVALUATION, pp. 0–0, 2020, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/35148.