Benchmarks for predictive power of retail underwriting models
-
Hello,
I am trying to find whether there's some benchmarking study showing what is the average predictive power of retail underwriting models (ideally stating what is the average GINI for different client/product segments).
I am trying to get a reference point to benchmark our model predictive power to our peers.
Thank you!
-
Thanks for the question, and one we see being posed fairly frequently
Gini benchmarks can depend on a few factors, i.e.,
Existing customer vs. new-to-bank
Gini will be higher for existing customers thanks to behavioural and transaction dataAvailability of credit bureau or other external data providers, and PSD2 data
If there is no credit bureau or PSD2 data, new-to-bank model is difficult to model and attainable Gini is lowerTime since development
Disc power will naturally decline over time, especially if application population has changed significantly either due to external/market-related factors or bank’s own underwriting standardsIn our experience and various surveys, we’ve found ranges for models that are old/ have been in use for some time in German and Belgian markets are roughly
- Belgium – Existing customers – instalment loan - 50-55%
- Belgium – New to bank – instalment loan - 35-40%
- Belgium – Existing customers – credit card - 55-60%
- Germany – Mixed – instalment loan unsecured - 40-45%
- Germany – Mixed – car loan + home improvement loan - 60%
- Germany – Mixed – credit card / overdrafts – 45%
- Germany – mixed – mortgages – 50-55%
And in validation standards, we’ve defined these thresholds for new model builds
- 60%+ Gini – Green
- 40-60% – Amber
- <40% – Red
-
A few final caveats
- Expectations for Gini depend on data quality and portfolio type, which is why you’re not finding a standard threshold and should perhaps guide them to a better question
- Any PD model can have as much predictive power as you want, depending on how you calculate the metric, so you need to assess holistically to determine if it’s a good model. You’re looking for robustness, precision, representativeness of the data etc. NB in banking the assumption “nobody defaults” is typically 97% accurate on back-testing, that’s not true in retail. Retail data is also more heteroscedastic so you need more segmental testing
- When banks say Gini they typically mean Corrected Gini, which is the same at low default rates but not the same when retail hits higher default rates
- Gini only measures rank ordering so doesn’t pick up on accuracy improvements that don’t alter that, so many non-banking sectors ditched it as a metric