Supervisor

Taufique Ahmed

Programme

MSc in Data Analytics

Subject

Computer Science

Abstract

This research presents a framework of workforce attrition exploration based on integration of data segmentation, feature selection and classification models. K-prototypes clustering and generation-based segmentation are applied to create data subsets processed using logistic regression, random forest, naïve Byes, and artificial neural networks. By segmenting data, this study aims at enhancing models classification capacity. To address data imbalance and improve models ability to identify minority class, synthetic minority oversampling technique (SMOTE) and adaptive synthetic sampling approach (ADASYN) are applied, with average recall rate improving from 0.39 to 0.61 (SMOTE) and 0.50 (ADASYN). Feature selection using Recursive Feature Elimination with SVC linear and logistic estimators are implemented to generate experimental data subsets – complete list of 29 independent features, feature-reduced datasets with 20 features (RFE linear SVC) and 14 features (logistic RFE). Machine learning models and ANNs are further trained and optimized on all experimental data subsets to define the best segmentation and feature selection approach and explore potential of evaluation metrics improvement. This combined approach shows significant classification capacity improvement with top performing naïve Byes model on clusters-based segmentation and resulting accuracy in a range between 0.86 to 0.9, recall in a range between 0.93 and absolute 1.0 for five clusters. For generation-based segmentation, naïve Byes achieves accuracy and recall between 0.76 to 1.0 for four generations. Proposed methodology and results demonstrate that shift from generalized to more focused, segment-based HR strategies has a potential of addressing workforce attrition problem, optimize resources, and reduce costs.

Date of Award

2025

Full Publication Date

2025

Access Rights

open access

Document Type

Capstone Project

Resource Type

thesis

Included in

Data Science Commons

Share

COinS