Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
Computing the Truck Factor in a Software Repository: A Machine Learning Approach
Download
Computing_the_Truck_factor_in_a_Software_Repository__A_Machine_Learning_Approach.pdf
Date
2024-7
Author
El Cheikh Ammar, Ahmad
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
35
views
0
downloads
Cite This
In every software engineering project, it is crucial to be aware of members playing a key role in the progression to ensure that they do not halt the project’s advancement. This is where the Truck or Bus factor comes into play, a metric that evaluates which developers would cause the development process to decelerate should they get removed (or hit by a truck/bus). Measuring the truck factor in software development is complex due to the many variables involved. Several algorithms have been developed to address this, utilizing data from version control systems where developers ``commit'' changes, providing insights into who changed what, when, and where, which ultimately grants algorithms aiming to study the Truck Factor access to immense data. The existing algorithms, however, suffer from the fact that they tend to tunnel vision on code-centric metrics such as commits made by a developer. While such a feature is important in assessing the contribution of a developer, it does not tell the whole story behind a contribution. Henceforth, this thesis aims to examine what features the algorithms in the literature utilize and design a feature set that addresses various coding-based metrics, collaborative behaviours, developer activity patterns, and the broader technological context of a project. Afterwards, multiple supervised machine learning models with different algorithms, such as Random Forest, Naive Bayes, etc., are designed to utilize this feature set to predict the key contributors in GitHub repositories, ultimately computing the truck factor. Random Forest with hypertuned parameters and an aggregated model of hypertuned Random Forest and Naive Bayes with priors achieve the best performance, with mean F1-Scores equaling 84% and 86%, respectively. These models outperform existing algorithms with consistently high precision and recall across most repositories, demonstrating robust identification of true Truck Factor members.
Subject Keywords
Truck Factor
,
Bus Factor
,
Machine Learning
,
Software Repositories
,
Version Control System
,
Random Forest
,
Naive Bayes
URI
https://hdl.handle.net/11511/111566
Collections
Northern Cyprus Campus, Thesis
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
A. El Cheikh Ammar, “Computing the Truck Factor in a Software Repository: A Machine Learning Approach,” M.S. - Master of Science, Middle East Technical University, 2024.