ICT

Evaluation and Development of Innovative NLP Techniques for Query-Focused Summarization Using Retrieval Augmented Generation (RAG) and a Small Language Model (SLM) in Educational Settings

Kirillos Akram Sawiras

Supervisor

James Garza

Programme

MSc in Data Analytics

Subject

Computer Science

Abstract

This research explores the development and evaluation of Query-Focused Summarization (QFS) techniques for educational content, leveraging a range of NLP models, including traditional extractive and abstractive methods, Small and Large Language Models (SLMs & LLMs). The study emphasizes the effectiveness of text preprocessing strategies, comparing original, minimally processed, and fully lemmatized text across these models. To enhance scalability and cost-efficiency, Retrieval-Augmented Generation (RAG) frameworks were applied, successfully reducing input tokens while maintaining high summarization quality.

The research involved the testing of various models on a general-purpose QFS dataset, simulating educational scenarios. Significant findings highlight the superior performance of LLMs, particularly Llama 70B and GPT-4o, though SLMs like GPT-4o mini demonstrated a strong balance between cost and performance. The integration of advanced prompt engineering and optimization techniques further improved model output.

The study also incorporated primary research with educational stakeholders, shaping a practical framework for the implementation of NLP-based summarization tools in educational environments. This framework emphasizes the importance of flexible, scalable systems that promote transparency between teachers and learners while addressing the need for ethical AI practices. The research offers a solid foundation for developing NLP-driven tools that enhance learner engagement and comprehension, laying the groundwork for further experimentation and real-world integration.

Date of Award

2024

Full Publication Date

2024

Access Rights

open access

Document Type

Capstone Project

Resource Type

thesis

Recommended Citation

Akram Sawiras, K. (2024) Evaluation and Development of Innovative NLP Techniques for Query-Focused Summarization Using Retrieval Augmented Generation (RAG) and a Small Language Model (SLM) in Educational Settings CCT College Dublin. DOI: https://doi.org/10.63227/762.164.40

Download

COinS

ICT

Evaluation and Development of Innovative NLP Techniques for Query-Focused Summarization Using Retrieval Augmented Generation (RAG) and a Small Language Model (SLM) in Educational Settings

Supervisor

Programme

Subject

Abstract

Date of Award

Full Publication Date

Access Rights

Document Type

Resource Type

Recommended Citation

Browse

Search

Author Corner

ICT

Evaluation and Development of Innovative NLP Techniques for Query-Focused Summarization Using Retrieval Augmented Generation (RAG) and a Small Language Model (SLM) in Educational Settings

Author

Supervisor

Programme

Subject

Abstract

Date of Award

Full Publication Date

Access Rights

Document Type

Resource Type

Recommended Citation

Share

Browse

Search

Author Corner