Show/Hide Menu
Hide/Show Apps
Logout
Türkçe
Türkçe
Search
Search
Login
Login
OpenMETU
OpenMETU
About
About
Open Science Policy
Open Science Policy
Open Access Guideline
Open Access Guideline
Postgraduate Thesis Guideline
Postgraduate Thesis Guideline
Communities & Collections
Communities & Collections
Help
Help
Frequently Asked Questions
Frequently Asked Questions
Guides
Guides
Thesis submission
Thesis submission
MS without thesis term project submission
MS without thesis term project submission
Publication submission with DOI
Publication submission with DOI
Publication submission
Publication submission
Supporting Information
Supporting Information
General Information
General Information
Copyright, Embargo and License
Copyright, Embargo and License
Contact us
Contact us
Income Classification Benchmark: From R (Academic Study) to Python (ML Pipeline)
Download
manuscript (2).pdf
Date
2025-10-01
Author
Erkan, Mehmet Ali
Metadata
Show full item record
This work is licensed under a
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License
.
Item Usage Stats
153
views
71
downloads
Cite This
The primary aim of this research is to construct a robust machine learning pipeline for income classification, predicting whether an individual earns above $50K based on demographic attributes such as work class, education, race, and gender.Initially developed as a statistical study in R-Studio to explore variable relationships and perform exploratory data analysis (EDA), the project has been significantly refactored into a production-ready Python environment to demonstrate modern MLOps standards.The methodology involves an end-to-end pipeline utilizing Scikit-Learn, incorporating advanced data cleaning, K-Nearest Neighbors (KNN) imputation for missing values, and automated feature scaling. While the initial research explored a broad range of algorithms, the current benchmark focuses on comparing the performance of Logistic Regression, Decision Trees, and Random Forest algorithms to establish a strong baseline. Model performance was rigorously assessed using Accuracy, Sensitivity, and F1-Score to account for categorical complexity. This dual-language approach highlights the transition from academic statistical inference to applied machine learning engineering.
URI
https://zenodo.org/records/17662766
https://hdl.handle.net/11511/116937
DOI
https://doi.org/10.5281/zenodo.17662766
Collections
Department of Statistics, Article
Citation Formats
IEEE
ACM
APA
CHICAGO
MLA
BibTeX
M. A. Erkan, “Income Classification Benchmark: From R (Academic Study) to Python (ML Pipeline),” 2025, Accessed: 00, 2025. [Online]. Available: https://zenodo.org/records/17662766.