Date of Award

2023

Document Type

Capstone Project

Programme

MSc in Data Analytics

Supervisor

Vikas Tomer

Abstract

Accurate One Day-Ahead Demand Forecasting (ODADF) is crucial for electrical network reliability, the environment, and trading markets. While individual models face challenges in achieving accurate predictions, ensemble learning models have emerged as potential solution. They have achieved success in ODADF in several countries; however, there has been no research conducted for the Irish power system. Therefore, research objectives were formed, to develop a framework of ensemble learning models, evaluate their performance, and examine their potential for ODADF in Ireland, to fill the gap. Experimentation, and CRISP-DM were selected as primary research methodology, and project management framework, respectively. The development of the framework considered a balance between performance and computational complexity of the configurations. Three stacking approaches were considered, such as classifiers and regressors as meta-learners, and heuristic rules. Various potential base-learners were considered, and two methods of supervised problem creation, based on Similar Day (SD) and Moving Window (MW) approaches, were proposed to enhance their pattern recognition in data. The cause-and-effect relationship between ensemble configurations and performance metrics for ODADF in Ireland was established, and the integration method emerged as the primary causal variable. The research methodology was divided into three phases, such as data preparation, experimentation with ensembles architectures, and validation of results. Data preparation included temporal features extraction, Daylight-Saving Time removal, and replacement of missing data and outliers. The results were validated by performance metrics, visual comparison to SDs from neighbouring weeks, and distributions before and after the processing. Investigations into lagged weekly and daily demand, and window size were performed for SD and MW approaches, respectively. Following investigation into correlation between lagged weather variables and demand; temperature, relative humidity and wind speed, lagged by 39-hours were selected as exogenous features. As weather data was distributed locally, three approaches for representative stations were proposed. Scaling of time series, and encoding of temporal features to cyclical and vector formats, were found beneficial to ODADF by correlation study and distributions comparison. Feature selection was performed separately for SD and MW approaches. Given that data from year 2020 was found to be an outlier, datasets were split primarily into training and testing datasets, covering years 2014-2019, and 2021-2022, respectively. Experimentation with base-learners and three integration methods was performed. Training and testing datasets were further split into training and validation subsets, covering years 2014-2018 and 2019, and 2021-2022, respectively. Bayesian optimisation with 10-fold cross-validation was selected for hyperparameters tuning. Potential base-learners were tuned, trained and evaluated on training datasets, and the twenty most promising ones were selected as base-learners. They showed fluctuations in their MAPE across different days of the week, months and hours. Potential ensembles were tuned, trained and evaluated on base-learners’ predictions for years 2015-2019 and training datasets, respectively. In the validation phase, base-learners and classification-based ensembles with hyperparameters inferred from previous phase, were refitted on unseen data, and the base-learners’ predictions for years 2021-2022, respectively, and evaluated on year 2022. The results proved the high potential of classification-based ensembles for ODADF in Ireland. Ensembles of twenty base-learners, with SVM and MLP classifiers as meta-learners, stood out as the most effective solution for ODADF in Ireland. They both achieved the lowest MAPE 1.91%, which was 11.2% improvement in comparison to the best base-learner, SVM (SD) registering MAPE 2.15%. While introduction of SD and MW approaches amplified the diversity of the base-learners’ predictions, incorporating virtual weather stations benefited the performance of classification-based ensembles. They not only harnessed the combined strengths but also mitigated the potential inconsistencies found in individual base-learners, achieving predictions aligned with the distribution of actual demand. Finally, while this research addressed the gap in knowledge, further work, using wider variety of base-learners and their integration methods, is needed to comprehensively bridge this gap.

Share

COinS