Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
Event-related microblog retrieval in Turkish
Download
Event-related microblog retrieval in Turkish.pdf
Date
2022-01-01
Author
Toraman, Çağrı
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
22
views
23
downloads
Cite This
Microblogs, such as tweets, are short messages in which users are able to share any opinion and information. Microblogs are mostly related to real-life events reported in news articles. Finding event-related microblogs is important to analyze online social networks and understand public opinion on events. However, finding such microblogs is a challenging task due to the dynamic nature of microblogs and their limited length. In this study, assuming that news articles are given as queries and microblogs as documents, we find event-related microblogs in Turkish. In order to represent news articles and microblogs, we examine encoding methods, namely traditional bag-of-words and word embeddings provided by BERT and FastText pretrained language models based on deep learning. We find the distance between the encoded news article and microblog to measure text similarity or relatedness between them. We then rank microblogs according to their relatedness to the input query. The experimental results show that (i) BERT-based model outperforms other encoding methods in Turkish, though bag-of-words with Dice similarity has a challenging performance in short text; (ii) news title is successful to represent event as query, and (iii) preprocessing Turkish microblogs has positive impact in bag-of-words and also FastText embeddings, while BERT embeddings are robust to noise in Turkish.
Subject Keywords
Microblogs
,
natural language processing
,
text similarity
,
text preprocessing
,
tweets
,
word embedding
URI
https://hdl.handle.net/11511/109615
Journal
TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES
DOI
https://doi.org/10.55730/1300-0632.3827
Collections
Department of Computer Engineering, Article
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
Ç. Toraman, “Event-related microblog retrieval in Turkish,”
TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES
, vol. 30, no. 3, pp. 1067–1083, 2022, Accessed: 00, 2024. [Online]. Available: https://hdl.handle.net/11511/109615.