IFRS 9 ECL: Backtesting PD Model Accuracy & Discriminatory Power

Under IFRS 9, calculating Expected Credit Losses (ECL) hinges on a critical component: the Probability of Default (PD) model. These models are essentially your crystal ball, predicting the likelihood of a borrower defaulting over a specific period. But how do you know if your crystal ball is clear, or if it's just reflecting shadows? That’s where robust backtesting comes in, specifically to assess a PD model’s discriminatory power and accuracy.

Backtesting isn't just a regulatory checkbox; it's a vital exercise to ensure your PD models are reliable, perform as expected, and provide credible inputs for your ECL figures. It helps validate the model's underlying assumptions and performance against actual observed outcomes, giving you confidence in your financial reporting and risk management decisions.

Assessing Discriminatory Power

Discriminatory power refers to a PD model's ability to differentiate between defaulting and non-defaulting obligors. Can it effectively separate the 'good' credits from the 'bad' ones? If your model can't do this well, its predictions won't be very useful.

Area Under the Receiver Operating Characteristic (AUROC) Curve

The AUROC is perhaps the most widely used metric for discriminatory power. Simply put, it measures the probability that a randomly chosen defaulting obligor has a higher predicted PD than a randomly chosen non-defaulting obligor. An AUROC of 0.5 suggests the model is no better than random guessing, while 1.0 indicates perfect discrimination. For practical purposes, a robust PD model typically aims for an AUROC significantly above 0.7.

Gini Coefficient (or Accuracy Ratio)

Closely related to AUROC, the Gini coefficient quantifies how well the model ranks obligors by risk. It essentially measures the separation between the cumulative bad rate and the cumulative good rate curves. A Gini coefficient ranges from 0 (no discriminatory power) to 1 (perfect discrimination). It can also be derived from the AUROC (Gini = 2 * AUROC - 1), offering another perspective on the model's ability to distinguish between defaulters and non-defaulters.

Assessing Model Accuracy (Calibration)

While discriminatory power tells you if your model can rank risks, accuracy (often referred to as calibration) tells you how well its predicted PDs match the actual observed default rates. Does a predicted PD of 5% actually translate to 5% of those borrowers defaulting?

Calibration Plot / Reliability Ratio

One of the most intuitive ways to assess accuracy is through a calibration plot. This involves grouping obligors into PD buckets (e.g., 0-1%, 1-2%, etc.) and comparing the average predicted PD within each bucket to the actual observed default rate for that same bucket. Ideally, these two values should be very close. The reliability ratio (observed default rate / predicted PD) should be close to 1 across all buckets, indicating good calibration.

Brier Score

The Brier score is a comprehensive accuracy measure that combines both calibration and discrimination aspects. It's the mean squared difference between the predicted PDs and the actual outcomes (1 for default, 0 for no default). A lower Brier score indicates better accuracy. This score is particularly useful because it penalizes predictions that are far from the actual outcome.

Hosmer-Lemeshow Test

For grouped data, the Hosmer-Lemeshow test is a statistical test for goodness-of-fit, used to assess whether the observed default rates match the expected default rates across different risk groups or deciles of predicted PDs. A high p-value (typically > 0.05) suggests that there is no significant difference between observed and predicted outcomes, indicating good calibration.

Practical Backtesting Steps

To effectively backtest, you'll need a historical dataset of actual defaults corresponding to the periods for which your PD model made predictions. Segment your portfolio by relevant characteristics (e.g., rating grade, industry, loan type) to perform granular analysis. Continuously calculate and monitor these metrics over time, comparing them against internal thresholds and expectations. Any significant deviations warrant investigation and potential model recalibration.

Ultimately,, rigorous backtesting of your PD models for both discriminatory power and accuracy is not just a regulatory obligation for IFRS 9 ECL; it's a cornerstone of sound credit risk management. By consistently applying these methodologies, you ensure your models are robust, reliable, and truly reflective of your portfolio's risk profile, leading to more accurate ECL calculations and better business decisions.

Sharpening Your Crystal Ball: Backtesting PD Models for IFRS 9 ECL Accuracy