MiDe22: An Annotated Multi-Event Tweet Dataset for Misinformation Detection

2024-01-01
Toraman, Çağrı
Ozcelik, Oguzhan
Şahinuç, Furkan
Can, Fazli
The rapid dissemination of misinformation through online social networks poses a pressing issue with harmful consequences jeopardizing human health, public safety, democracy, and the economy; therefore, urgent action is required to address this problem. In this study, we construct a new human-annotated dataset, called MiDe22, having 5,284 English and 5,064 Turkish tweets with their misinformation labels for several recent events between 2020 and 2022, including the Russia-Ukraine war, COVID-19 pandemic, and Refugees. The dataset includes user engagements with the tweets in terms of likes, replies, retweets, and quotes. We also provide a detailed data analysis with descriptive statistics and the experimental results of a benchmark evaluation for misinformation detection.
Joint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024
Citation Formats
Ç. Toraman, O. Ozcelik, F. Şahinuç, and F. Can, “MiDe22: An Annotated Multi-Event Tweet Dataset for Misinformation Detection,” presented at the Joint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024, Hybrid, Torino, İtalya, 2024, Accessed: 00, 2024. [Online]. Available: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85195974364&origin=inward.