Supervisor

Taufique Ahmed

Programme

HDIP in Data Analytics for Business

Abstract

This capstone project applies machine learning to detect credit card fraud, addressing a critical financial threat to banks and payment providers. Using an anonymised dataset of 284,807 transactions, which is highly imbalanced with only 0.172% fraudulent cases, three models—Logistic Regression, Random Forest, and Gradient Boosting—were developed and evaluated. The pipeline incorporates data preprocessing, feature engineering, hyperparameter tuning, cross-validation, and interpretability analysis using SHAP values, SHAPASH, and permutation importance. Random Forest achieved the highest performance with an ROC AUC of 0.97 and Average Precision of 0.66. The study also considers fairness, threshold optimisation, and practical deployment strategies, providing a robust automated solution for identifying suspicious activity.

Date of Award

2025

Full Publication Date

2025

Access Rights

open access

Document Type

Capstone Project

Resource Type

thesis

Included in

Data Science Commons

Share

COinS