ICT

Data Analysis of Twitter’s Nasdaq100 Sentiments and Topics as Indicators for News Articles Retrieval: Fine-Tuning RoBERTa and RAG.

Kagan Timur, CCT College Dublin

Supervisor

Dr. Muhammad Iqbal

Programme

MSc in Data Analytics

Subject

Computer Science

Abstract

This study investigates the combination of sentiment analysis using the VADER lexicon and semantic analysis through Latent Dirichlet Allocation (LDA) to identify real-life events, focusing on Twitter datasets. The research shows that while sentiment analysis alone may be insufficient, combining it with semantic analysis improves the process, particularly for identifying relevant news articles and understanding brand perception on social media. The study also fine-tunes the RoBERTa model for question-answering tasks, yielding significant improvements in the SQuAD evaluation metric. The exact match (EM) score rose dramatically from 2.06% to 62%, and the F1 score improved from 9.41% to 65%. A retrieval and generator system was developed to extract and generate question-and-answer responses from news articles using both the original and fine-tuned model iterations. Initial results showed an EM score of 4.5% and an F1 score of 24.26% for the original model, while the fine-tuned model improved the EM score to 17%, though the F1 score decreased to 21.85%. These findings suggest the potential for catastrophic forgetting, indicating a need for further refinement to balance improved subjective question-answering capabilities with overall knowledge retention.

Date of Award

2024

Full Publication Date

2024

Access Rights

open access

Document Type

Capstone Project

Resource Type

thesis

Recommended Citation

Timur, K. (2024) Data Analysis of Twitter’s Nasdaq100 Sentiments and Topics as Indicators for News Articles Retrieval: Fine-Tuning RoBERTa and RAG. CCT College Dublin. DOI: https://doi.org/10.63227/478.662.81

Download

Included in

Computer Sciences Commons, Data Science Commons

COinS

ICT

Data Analysis of Twitter’s Nasdaq100 Sentiments and Topics as Indicators for News Articles Retrieval: Fine-Tuning RoBERTa and RAG.

Supervisor

Programme

Subject

Abstract

Date of Award

Full Publication Date

Access Rights

Document Type

Resource Type

Recommended Citation

Included in

Browse

Search

Author Corner

ICT

Data Analysis of Twitter’s Nasdaq100 Sentiments and Topics as Indicators for News Articles Retrieval: Fine-Tuning RoBERTa and RAG.

Author

Supervisor

Programme

Subject

Abstract

Date of Award

Full Publication Date

Access Rights

Document Type

Resource Type

Recommended Citation

Included in

Share

Browse

Search

Author Corner