Towards a Programmable Humanizing AI through Scalable Stance-Directed Architecture

2024-01-01
Çetinkaya, Yusuf Mucahit
Lee, Yeonjung
Külah, Emre
Toroslu, İsmail Hakkı
Cowan, Michael A.
Davulcu, Hasan
The rise of harmful online content underscores the urgent need for AI systems to effectively detect, filter those, and foster safer and healthier communication. This article introduces a novel approach to mitigate toxic content generation propensities of Large Language Models (LLMs) by fine-tuning them with a programmable stance-directed focus on core human values and common good. We propose a streamlined keyword coding and processing pipeline to generate weakly labeled data to train AI models that can avoid toxicity and champion civil discourse. We also developed a toxicity classifier and an Aspect-based Sentiment Analysis (ABSA) model to assess and control the effectiveness of a humanizing AI model. We evaluate the proposed pipeline using a contentious real-world Twitter dataset on U.S. race relations. Our approach successfully curbs the toxic content generation propensity of an unrestricted LLM by a significant 85%.
IEEE Internet Computing
Citation Formats
Y. M. Çetinkaya, Y. Lee, E. Külah, İ. H. Toroslu, M. A. Cowan, and H. Davulcu, “Towards a Programmable Humanizing AI through Scalable Stance-Directed Architecture,” IEEE Internet Computing, pp. 0–0, 2024, Accessed: 00, 2024. [Online]. Available: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85203522810&origin=inward.