Supervisor

James Garza

Programme

MSc in Data Analytics

Subject

Computer Science

Abstract

This research explores the development and evaluation of Query-Focused Summarization (QFS) techniques for educational content, leveraging a range of NLP models, including traditional extractive and abstractive methods, Small and Large Language Models (SLMs & LLMs). The study emphasizes the effectiveness of text preprocessing strategies, comparing original, minimally processed, and fully lemmatized text across these models. To enhance scalability and cost-efficiency, Retrieval-Augmented Generation (RAG) frameworks were applied, successfully reducing input tokens while maintaining high summarization quality.

The research involved the testing of various models on a general-purpose QFS dataset, simulating educational scenarios. Significant findings highlight the superior performance of LLMs, particularly Llama 70B and GPT-4o, though SLMs like GPT-4o mini demonstrated a strong balance between cost and performance. The integration of advanced prompt engineering and optimization techniques further improved model output.

The study also incorporated primary research with educational stakeholders, shaping a practical framework for the implementation of NLP-based summarization tools in educational environments. This framework emphasizes the importance of flexible, scalable systems that promote transparency between teachers and learners while addressing the need for ethical AI practices. The research offers a solid foundation for developing NLP-driven tools that enhance learner engagement and comprehension, laying the groundwork for further experimentation and real-world integration.

Date of Award

2024

Full Publication Date

2024

Access Rights

open access

Document Type

Capstone Project

Resource Type

thesis

Share

COinS