Supervisor

David McQuaid

Programme

MSc in Data Analytics

Subject

Computer Science

Abstract

This study investigates whether clustering and topic modeling can uncover themes within the 167 verses of the King James Version of the Book of Esther. A standardized preprocessing pipeline was applied, and TF-IDF and sentence-embedding feature spaces were used to evaluate topic extraction (NMF, GSDMM, BERTopic, Top2Vec) and clustering (HDBSCAN, DBSCAN, Gaussian Mixtures, Agglomerative) using coherence, cluster validity, lexical distinctiveness, and cross-model similarity metrics. NMF provided the most interpretable topics, GSDMM yielded compact low-overlap topics, BERTopic offered entity-centered groupings, and Top2Vec highlighted broad thematic regions. Cross-model analysis revealed overlapping motifs, notably decree and banquet themes, but clusters and topics did not align perfectly, highlighting their complementary strengths for literary analysis.

Date of Award

2025

Full Publication Date

2025

Access Rights

open access

Document Type

Capstone Project

Resource Type

thesis

Recommended Citation

Byrne, L. (2025) Discovering Latent Themes: Mixed-Methods Comparative Analysis of Topic Extraction and Clustering. CCT College Dublin. DOI: https://doi.org/10.63227/652.299.92

Download

Included in

Data Science Commons

COinS

ICT

Discovering Latent Themes: Mixed-Methods Comparative Analysis of Topic Extraction and Clustering.

Supervisor

Programme

Subject

Abstract

Date of Award

Full Publication Date

Access Rights

Document Type

Resource Type

Recommended Citation

Included in

Browse

Search

Author Corner

ICT

Discovering Latent Themes: Mixed-Methods Comparative Analysis of Topic Extraction and Clustering.

Author

Supervisor

Programme

Subject

Abstract

Date of Award

Full Publication Date

Access Rights

Document Type

Resource Type

Recommended Citation

Included in

Share

Browse

Search

Author Corner