Supervisor

Taufique Ahmed

Programme

MSc in Data Analytics

Subject

Computer Science

Abstract

This thesis explores the feasibility of employing data analytics techniques in chess, with the purpose of profiling player styles and building a comprehensive chess analytics platform. The data set consists of over 20,000 anonymized games, and therefore, the study involved feature engineering, classification and visual analytics, in order to gain more insight into player decision making in chess. The data pre-processing part of analysis involved parsing Portable Game Notation (PGN) files, and feature engineering positional characteristics - material imbalance, pawn structure, king safety, and piece mobility - along with quantifying the sample using Average Centipawn Loss (ACPL) using Stockfish. ACPL and SDPL were the quantitative observations of accuracy and consistency for playing chess. These statistics were correlated with Elo ratings, providing validation of their value as measures of performance. To classify player styles, convolutional neural networks (CNNs) were trained on heatmap formats of move densities, allowing to classify play styles at a rate of 83.4 percent accuracy for aggressive, defensive, and balanced styles. Machine Learning algorithms like Random Forests and K-Means Clustering provided an avenue of interpretability and exploratory analysis. Additional visualizations created, such as heatmaps, evaluation trendlines, and Radar Charts, built with Tufte design principles of clarity and simplicity, provided output that would be easy for players to digest and interpret. An interactive dashboard prototype included all these capabilities and allowed users (i.e., players, coaches, and educators) to explore play styles, performance trends, and positional imbalances in real-time. Ethical implications of the research were mitigated with the anonymization of participant data in accordance with the principles of the EU Artificial Intelligence Act and GDPR compliance. These findings demonstrate that chess provides a rich platform for data analytics; features include an opportunity for structured data, evaluation metrics specific to the domain, and machine learning to promote knowledge of analysis specific to chess and the broader scope of sports analytics. The research demonstrates how a data-rich approach may enhance our knowledge of human decision-making and player performance in competitive contexts.

Date of Award

2025

Full Publication Date

2025

Access Rights

open access

Document Type

Capstone Project

Resource Type

thesis

Included in

Data Science Commons

Share

COinS