Supervisor

Matt Lemon

Programme

MSc in Data Analytics

Subject

Computer Science

Abstract

This study investigates the prediction of multi-class S&P corporate credit ratings for European non-financial firms from 2010 to 2024 using a machine learning framework grounded in financial fundamentals. To ensure robustness and generalizability, the analysis excluded the Year variable, which was identified as a source of data leakage. After this correction, non-linear ensemble models demonstrated a clear advantage over linear baselines. The top-performing Random Forest model achieved a weighted F1-score of approximately 0.60, more than doubling the performance of the Logistic Regression benchmark used as a baseline (0.26), with most misclassifications concentrated in adjacent rating categories. This indicates that while precise distinctions remain challenging, the models capture overall credit quality effectively. A SHAP (SHapley Additive exPlanations) analysis confirmed that predictions are driven by financially intuitive features, particularly long-term profitability (Retained_Earnings_to_Assets), firm size (Market Capitalization), and core earnings measures. These findings align with established credit risk theory, suggesting that a substantial portion of agency ratings is systematically associated with publicly available financial data. The research contributes a transparent and academically sound benchmark for quantitative credit risk assessment. While the results underscore the continuing importance of qualitative and forward-looking judgment applied by credit rating agencies, the study demonstrates that machine learning can provide a valuable, interpretable complement for credit monitoring, screening, and early-warning applications.

Date of Award

2025

Full Publication Date

2025

Document Type

Capstone Project

Resource Type

thesis

Included in

Data Science Commons

Share

COinS