Automated priority detection in software bugs: A comprehensive study on transformer-based encoders with contrastive learning, large language models and vector databases for enhanced efficiency

Download
2024-1-25
Yılmaz, Eyüp Halit
Software development processes include many challenges that require human effort and time investment. In time, many tools and techniques are developed to address these challenges and automate parts of software development and maintenance. Software bug reports are textual descriptions, often accompanied by code snippets and error logs, written by users or developers documenting operational failures of programs. These reports are later examined by the assigned developer to fix the bug. Automating the bug fixing pipeline helps determine the most suitable developer to assign to a given bug report, predict the bug fix time, estimate a priority level an so on. This thesis focuses on automated software bug report priority detection using state-of-the-art classification techniques. Widely successful transformer-based encoder classifiers are adapted to software domain via fine-tuning using open source datasets. Large Language Models (LLMs), on the other hand, are recently popularized transformer decoder networks specifically trained for text generation, which can be configured for priority class prediction. In order to accurately shape LLM output into desired format, Retrieval Augmented Generation (RAG) is used to condition the network to the downstream task and domain. Vector databases help store textual content in the bug reports according to cosine similarity and retrieve related instances during inference.
Citation Formats
E. H. Yılmaz, “Automated priority detection in software bugs: A comprehensive study on transformer-based encoders with contrastive learning, large language models and vector databases for enhanced efficiency,” M.S. - Master of Science, Middle East Technical University, 2024.