Supervisor

David Gonzalez

Programme

MSc in Data Analytics

Subject

Computer Science

Abstract

Churn rates are remarkably high in the gambling industry, an extremely competitive landscape coupled with a severe lack of brand loyalty among its customer base makes churn prediction one of the main problems an operator will face. This paper explores the range of possible modelling solutions with a key emphasis on ensemble learning to improve on existing methods. During this exploration, a host of modelling techniques are formulated with a focus on scalability facilitated by Apache Spark distributed computing language. Thirteen variations of models, including single classifiers and ensemble families are evaluated as to their suitability in solving the problem. The limitation of the data set provided is that it is not diverse enough to encapsulate the true dynamism relationship between the customer and an operator. Nevertheless, the paper can provide a host of solutions that satisfy the goal of scalability. The project provides three separate recommendations that are flexible based on the firm needs recording ensemble classification precision scores of 86%, 87% and 95% respectively. The author proves that ensemble learning is a stronger predictive solution in the context of churn prediction in the gambling industry. In addition to demonstrating the power of ensemble learning the paper provides an application based on the author’s strongest modelling approach that is applied on a unseen validation set. The output of the application returns a list of customer account numbers who are predicted churners that internal CRM teams can use to improve processes.

Date of Award

2025

Full Publication Date

2025

Access Rights

open access

Document Type

Capstone Project

Resource Type

thesis

Included in

Data Science Commons

Share

COinS