Supervisor
Dr. Muhammad Iqbal
Programme
MSc in Data Analytics
Subject
Computer Science
Abstract
This study investigates the combination of sentiment analysis using the VADER lexicon and semantic analysis through Latent Dirichlet Allocation (LDA) to identify real-life events, focusing on Twitter datasets. The research shows that while sentiment analysis alone may be insufficient, combining it with semantic analysis improves the process, particularly for identifying relevant news articles and understanding brand perception on social media. The study also fine-tunes the RoBERTa model for question-answering tasks, yielding significant improvements in the SQuAD evaluation metric. The exact match (EM) score rose dramatically from 2.06% to 62%, and the F1 score improved from 9.41% to 65%. A retrieval and generator system was developed to extract and generate question-and-answer responses from news articles using both the original and fine-tuned model iterations. Initial results showed an EM score of 4.5% and an F1 score of 24.26% for the original model, while the fine-tuned model improved the EM score to 17%, though the F1 score decreased to 21.85%. These findings suggest the potential for catastrophic forgetting, indicating a need for further refinement to balance improved subjective question-answering capabilities with overall knowledge retention.
Date of Award
2024
Full Publication Date
2024
Access Rights
open access
Document Type
Capstone Project
Resource Type
thesis
Recommended Citation
Timur, Kagan, "Data Analysis of Twitter’s Nasdaq100 Sentiments and Topics as Indicators for News Articles Retrieval: Fine-Tuning RoBERTa and RAG." (2024). ICT. 59.
https://arc.cct.ie/ict/59