AN INVESTIGATION OF ISSUE LABELING IN OPEN SOURCE SOFTWARE PROJECTS USING LARGE LANGUAGE MODELS

2024-9-06
Deniz, İrem Selin
In the evolving landscape of open source software projects, effective issue management remains a pivotal aspect of sustaining project success. Issue reports provide valuable information, as they are created for reporting bugs, requesting new features, or asking questions about a software product. The high number of issue reports, which vary widely in quality, requires accurate issue classification mechanisms to prioritize work and manage resources effectively. Properly assigned issue labels are crucial for effective project management and for the reliability of research conducted to improve issue management, as such research often assumes the assigned issue labels as the ground truth. This study aims to assess the reliability of the assigned issue labels in open source software development projects to improve issue management processes. The research involves collecting two datasets of issue reports from open source software development projects hosted on GitHub. Experiments were conducted with state-of-the-art large language models for issue label classification. Furthermore, a qualitative analysis was performed to evaluate the relevance of the assigned issue labels with respect to the content of the issue reports. The empirical study performed on issue reports revealed a significant mismatch between the assigned issue labels and the actual content of the issue reports. The study also demonstrated the effectiveness of state-of-the-art large language models in classifying issue labels, while highlighting concerns about the reliability of issue labels in open source software development projects.
Citation Formats
İ. S. Deniz, “AN INVESTIGATION OF ISSUE LABELING IN OPEN SOURCE SOFTWARE PROJECTS USING LARGE LANGUAGE MODELS,” M.S. - Master of Science, Middle East Technical University, 2024.