Supervisor

Dr. Muhammad Iqbal

Programme

MSc in Data Analytics

Subject

Computer Science

Abstract

This study investigates the combination of sentiment analysis using the VADER lexicon and semantic analysis through Latent Dirichlet Allocation (LDA) to identify real-life events, focusing on Twitter datasets. The research shows that while sentiment analysis alone may be insufficient, combining it with semantic analysis improves the process, particularly for identifying relevant news articles and understanding brand perception on social media. The study also fine-tunes the RoBERTa model for question-answering tasks, yielding significant improvements in the SQuAD evaluation metric. The exact match (EM) score rose dramatically from 2.06% to 62%, and the F1 score improved from 9.41% to 65%. A retrieval and generator system was developed to extract and generate question-and-answer responses from news articles using both the original and fine-tuned model iterations. Initial results showed an EM score of 4.5% and an F1 score of 24.26% for the original model, while the fine-tuned model improved the EM score to 17%, though the F1 score decreased to 21.85%. These findings suggest the potential for catastrophic forgetting, indicating a need for further refinement to balance improved subjective question-answering capabilities with overall knowledge retention.

Date of Award

2024

Full Publication Date

2024

Access Rights

open access

Document Type

Capstone Project

Resource Type

thesis

Share

COinS