Programme

HDIP in Data Analytics for Business

Abstract

This capstone project investigates the application of machine learning and natural language processing (NLP) to enhance customer support operations through automated ticket classification, prioritization, and summarization. Using the multilingual Customer Support Emails dataset from Kaggle, the project follows the CRISP-DM methodology, performing extensive data cleaning, preprocessing, feature engineering, and class balancing. Five machine learning models—Decision Tree, KNN, LinearSVC, Naive Bayes, and Random Forest—were evaluated using hyperparameter tuning, cross-validation, confusion matrix analysis, and learning curves. LinearSVC demonstrated the strongest performance for both queue and priority classification, achieving accuracies of 89.8% and 81.2% respectively, with consistent generalization across folds. For summarization, extractive and abstractive methods were implemented using BERT and BART, with extractive summarization selected as the most reliable for preserving technical accuracy. The results show that machine learning can significantly improve ticket routing efficiency, reduce resolution time, and support customer service agents by providing concise issue overviews. This work demonstrates a practical framework for integrating AI into customer support workflows while addressing ethical considerations such as data privacy, fairness, and robustness.

Date of Award

2025

Full Publication Date

2025

Access Rights

open access

Document Type

Capstone Project

Resource Type

thesis

Included in

Data Science Commons

Share

COinS