Supervisor
Dr. Muhammad Iqbal
Programme
MSc in Data Analytics
Subject
Computer Science
Abstract
Lapses are an issue in the insurance industry in general. They affect a company’s profitability, cash flows and solvency. High levels of lapses can cause reputational damage that could provoke a cycle of even more lapses. It is therefore incumbent on a company to do its utmost to retain the business it has written for the term it was written for.
If a company could predict which of its policies were about to lapse, it could proactively attempt to prevent them by contacting the policyholder and engaging in a discussion to ascertain the likelihood of their choosing to leave. In this paper, various machine learning tools will be employed on a set of life company policy and client data. The tools include sentiment analysis, Random Forests, Artificial Neural Networks (ANN), k-Nearest Neighbour (kNN) and Support Vector Machine Classification. Among the metrics examined will be sentiment over time, confusion matrices and accuracy of the Random Forests, ANN, kNN and SVM.
The data is derived from one company’s life assurance policy and policyholder data as stored on its administration systems and extracted to a SQL database. As there are two different systems, the data from both had to be transformed into a canonical format. Also, as the models used optimally need numerical data, some categorical data had to be transformed into numerical data. Separately, data reflecting policyholder sentiment was also captured.
The modelling found that the Random Forest model was the most accurate, with accuracies of 88.18% for single life and 83.42% for joint life data. The next most accurate was a kNN with accuracies of 87.53% and 81.38%. Then follows ANN with accuracies of the order of 87.83% and 79.69% respectively. (kNN rated higher due to its overall better performance). A Support Vector Machine (trained on the optimal parameters found by a 5-fold cross validation on 12 combinations of parameters) correctly identified 84.99% of cases.
A market basket analysis was carried out (using the apriori algorithm) to see what combinations of benefits were present in the customer base and the results are summarised below in section 4.4 and detailed in the appendix. The results can be used to aid customers in adding benefits to their policies (along with the appropriate checks from a qualified intermediary).
A possible future extension for this work is to split the modelling across multiple PCs, so the training could be run for longer (for example more decision trees in the Random Forest, more training epochs for the neural networks or more parameters in the Grid search on the SVM) or on a distributed system. Other future work could consist of explicit modelling by product type which would produce several smaller but better trained models. The input data could be enhanced if data that is on the admin systems but not currently in the extracts gets added. This could provide more granular policy, life and transaction history information which could refine the models.
Date of Award
2025
Full Publication Date
2025
Access Rights
open access
Document Type
Capstone Project
Resource Type
thesis
Recommended Citation
Cunningham, B. (2025) The Actuarial Applications of Machine Learning and Big Data in the Life Assurance industry: Managing customer retention and customer outcomes by the application of data science. CCT College Dublin.