HypoTermQA: Hypothetical Terms Dataset for Benchmarking Hallucination Tendency of LLMs

Date

2024-01-01

Author

Uluoglakci, Cem
Taşkaya Temizel, Tuğba

Metadata

Show full item record

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Item Usage Stats

151
views

0
downloads

Hallucinations pose a significant challenge to the reliability and alignment of Large Language Models (LLMs), limiting their widespread acceptance beyond chat-bot applications. Despite ongoing efforts, hallucinations remain a prevalent challenge in LLMs. The detection of hallucinations itself is also a formidable task, frequently requiring manual labeling or constrained evaluations. This paper introduces an automated scalable framework that combines benchmarking LLMs’ hallucination tendencies with efficient hallucination detection. We leverage LLMs to generate challenging tasks related to hypothetical phenomena, subsequently employing them as agents for efficient hallucination detection. The framework is domain-agnostic, allowing the use of any language model for benchmark creation or evaluation in any domain. We introduce the publicly available HypoTermQA Benchmarking Dataset, on which state-of-the-art models’ performance ranged between 3% and 11%, and evaluator agents demonstrated a 6% error rate in hallucination prediction. The proposed framework provides opportunities to test and improve LLMs. Additionally, it has the potential to generate benchmarking datasets tailored to specific domains, such as law, health, and finance.

URI

https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85188679789&origin=inward
https://hdl.handle.net/11511/109325

Conference Name

18th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2024 - Student Research Workshop, SRW 2024

Collections

Graduate School of Informatics, Conference / Seminar

Citation Formats

C. Uluoglakci and T. Taşkaya Temizel, “HypoTermQA: Hypothetical Terms Dataset for Benchmarking Hallucination Tendency of LLMs,” presented at the 18th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2024 - Student Research Workshop, SRW 2024, St. Julian’s, Malta, 2024, Accessed: 00, 2024. [Online]. Available: https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85188679789&origin=inward.