Supervisor
David McQuaid
Programme
MSc in Data Analytics
Subject
Computer Science
Abstract
This study investigates whether clustering and topic modeling can uncover themes within the 167 verses of the King James Version of the Book of Esther. A standardized preprocessing pipeline was applied, and TF-IDF and sentence-embedding feature spaces were used to evaluate topic extraction (NMF, GSDMM, BERTopic, Top2Vec) and clustering (HDBSCAN, DBSCAN, Gaussian Mixtures, Agglomerative) using coherence, cluster validity, lexical distinctiveness, and cross-model similarity metrics. NMF provided the most interpretable topics, GSDMM yielded compact low-overlap topics, BERTopic offered entity-centered groupings, and Top2Vec highlighted broad thematic regions. Cross-model analysis revealed overlapping motifs, notably decree and banquet themes, but clusters and topics did not align perfectly, highlighting their complementary strengths for literary analysis.
Date of Award
2025
Full Publication Date
2025
Access Rights
open access
Document Type
Capstone Project
Resource Type
thesis
Recommended Citation
Byrne, L.
(2025) Discovering Latent Themes: Mixed-Methods Comparative Analysis of Topic Extraction and Clustering. CCT College Dublin.
DOI: https://doi.org/10.63227/652.299.92