Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
Large-Scale Hate Speech Detection with Cross-Domain Transfer
Date
2022-01-01
Author
Toraman, Çağrı
Sahinuc, Furkan
Yilmaz, Eyup Halit
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
17
views
0
downloads
Cite This
The performance of hate speech detection models relies on the datasets on which the models are trained. Existing datasets are mostly prepared with a limited number of instances or hate domains that define hate topics. This hinders large-scale analysis and transfer learning with respect to hate domains. In this study, we construct large-scale tweet datasets for hate speech detection in English and a low-resource language, Turkish, consisting of human-labeled 100k tweets per each. Our datasets are designed to have equal number of tweets distributed over five domains. The experimental results supported by statistical tests show that Transformer-based language models outperform conventional bag-of-words and neural models by at least 5% in English and 10% in Turkish for large-scale hate speech detection. The performance is also scalable to different training sizes, such that 98% of performance in English, and 97% in Turkish, are recovered when 20% of training instances are used. We further examine the generalization ability of cross-domain transfer among hate domains. We show that 96% of the performance of a target domain in average is recovered by other domains for English, and 92% for Turkish. Gender and religion are more successful to generalize to other domains, while sports fail most.
URI
https://hdl.handle.net/11511/109689
Conference Name
13th International Conference on Language Resources and Evaluation (LREC)
Collections
Department of Computer Engineering, Conference / Seminar
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
Ç. Toraman, F. Sahinuc, and E. H. Yilmaz, “Large-Scale Hate Speech Detection with Cross-Domain Transfer,” presented at the 13th International Conference on Language Resources and Evaluation (LREC), Marseille, Fransa, 2022, Accessed: 00, 2024. [Online]. Available: https://hdl.handle.net/11511/109689.