Supervisor

Kislay Raj

Programme

MSc in Data Analytics

Abstract

This study focuses on predicting electricity consumption through data analytics and ensemble learning methods, addressing fluctuations influenced by external economic factors. Techniques like Gradient Boosting Regressor (GBR) and Random Forest Regressor (RFR) proved effective due to their ability to generalise well with new data. CRISP-DM served as the guiding methodology, supported by robust preprocessing techniques such as winsorisation to handle outliers, feature selection to refine variables, and scaling to standardise data for improved model performance.

The research involved datasets from non-residential clients and data centres, uncovering consumption patterns through visualisations in Tableau. Analysis showed that County Dublin and Kildare were among the highest electricity consumers in Leinster from 2015 to 2022. Advanced feature selection improved model accuracy by removing variables with low correlation to the target, while preprocessing steps like one-hot encoding and data scaling ensured optimal input for regression models.

Results highlighted the predictive strength of ensemble methods, with GBR and RFR achieving high R² scores, low RMSE, and robust cross-validation performance. GBR particularly excelled with strong reliability across data subsets and balanced training and testing accuracy. While the study reinforced existing insights into ensemble learning's capabilities, it demonstrated the practicality of these models for handling tabular data and extracting actionable findings from limited datasets.

Date of Award

2024

Full Publication Date

2024

Access Rights

open access

Document Type

Capstone Project

Resource Type

thesis

Share

COinS