Programme

HDIP in Data Analytics for Business

Subject

Computer Science

Abstract

Credit card fraud poses a significant challenge to financial institutions, leading to substantial financial losses and declining customer trust. This project develops and evaluates machine learning models to detect fraudulent credit card transactions using a large, realistic synthetic dataset. Following data preprocessing, exploratory analysis, and class-balancing using SMOTE, four supervised models—Logistic Regression, Decision Tree, Random Forest, and XGBoost—were trained and compared. Performance was assessed using metrics suited to imbalanced classification, including AUC, Recall, Precision, F1-score, and Average Precision. Results show that XGBoost, particularly after hyperparameter optimisation, delivered the strongest performance (AUC 0.99, Recall 0.83, AP 0.70), outperforming other models and demonstrating high effectiveness in identifying fraudulent activity with manageable false-positive rates. Findings confirm that tree-based ensemble methods are well-suited to fraud detection and that appropriate resampling and tuning strategies significantly improve predictive accuracy.

Date of Award

2025

Full Publication Date

2025

Access Rights

open access

Document Type

Capstone Project

Resource Type

thesis

Included in

Data Science Commons

Share

COinS