A couple of thoughts on this subject, from one of our experts:
The discrepancy is caused by the adjustment implicitly assuming that a Bank would have had more defaults and lower scores (and so a worse average score) – while applying the theorem to a population which still has the same set of defaulted cases. This means the average scores are not worse, and hence you predicted PD will be lower.
There are at least two approaches to deal with this effect:
- Adjust the constant term in the logistic until it hits the 2% target
- Run a “goal seek analysis" so that the average PD after mapping scores to the Bank grades, and applying the appropriate post-rating adjustments so the PD reaches 2%
Especially for European banks IRB models are actually required to be quite conservative unless Banks have "perfect" data, so the long-run average can become a moot point to a certain extent
On the topic of perfect data: if the Bank has enough data and the PD model is really powerful, it should find that there is no straight-line relationship between PD from logistic model vs. observed default rate. This is actually caused by the fact that whilst the errors are broadly normally distributed in logOdds space, when the distribution is converted to PD/default rate space the expectation will be closer to the mean than the original prediction.