Supervisor
James Garza
Programme
MSc in Data Analytics
Subject
Computer Science
Abstract
This research explores the development and evaluation of Query-Focused Summarization (QFS) techniques for educational content, leveraging a range of NLP models, including traditional extractive and abstractive methods, Small and Large Language Models (SLMs & LLMs). The study emphasizes the effectiveness of text preprocessing strategies, comparing original, minimally processed, and fully lemmatized text across these models. To enhance scalability and cost-efficiency, Retrieval-Augmented Generation (RAG) frameworks were applied, successfully reducing input tokens while maintaining high summarization quality.
The research involved the testing of various models on a general-purpose QFS dataset, simulating educational scenarios. Significant findings highlight the superior performance of LLMs, particularly Llama 70B and GPT-4o, though SLMs like GPT-4o mini demonstrated a strong balance between cost and performance. The integration of advanced prompt engineering and optimization techniques further improved model output.
The study also incorporated primary research with educational stakeholders, shaping a practical framework for the implementation of NLP-based summarization tools in educational environments. This framework emphasizes the importance of flexible, scalable systems that promote transparency between teachers and learners while addressing the need for ethical AI practices. The research offers a solid foundation for developing NLP-driven tools that enhance learner engagement and comprehension, laying the groundwork for further experimentation and real-world integration.
Date of Award
2024
Full Publication Date
2024
Access Rights
open access
Document Type
Capstone Project
Resource Type
thesis
Recommended Citation
Akram Sawiras, Kirillos, "Evaluation and Development of Innovative NLP Techniques for Query-Focused Summarization Using Retrieval Augmented Generation (RAG) and a Small Language Model (SLM) in Educational Settings" (2024). ICT. 66.
https://arc.cct.ie/ict/66