Supervisor

David McQuaid

Programme

MSc in Data Analytics

Subject

Computer Science

Abstract

This study investigates whether clustering and topic modeling can uncover themes within the 167 verses of the King James Version of the Book of Esther. A standardized preprocessing pipeline was applied, and TF-IDF and sentence-embedding feature spaces were used to evaluate topic extraction (NMF, GSDMM, BERTopic, Top2Vec) and clustering (HDBSCAN, DBSCAN, Gaussian Mixtures, Agglomerative) using coherence, cluster validity, lexical distinctiveness, and cross-model similarity metrics. NMF provided the most interpretable topics, GSDMM yielded compact low-overlap topics, BERTopic offered entity-centered groupings, and Top2Vec highlighted broad thematic regions. Cross-model analysis revealed overlapping motifs, notably decree and banquet themes, but clusters and topics did not align perfectly, highlighting their complementary strengths for literary analysis.

Date of Award

2025

Full Publication Date

2025

Access Rights

open access

Document Type

Capstone Project

Resource Type

thesis

Included in

Data Science Commons

Share

COinS