C3w5 - interactions vs correlations between features

With regards to Permutation Feature Importance video, it is stated from 2min03sec:

“Since by permuting the feature you also destroy the interaction effects with other features, it also shows the interactions between features. This means that it accounts for both the main feature effects and the interaction effects on model performance.”

But it then states from 3min0sec:

“Like PDP, correlated features are once again a problem.”

So, now it’s not clear to me if/how these two are different. The Permutation Feature Importance method can be used to show interactions between features, but at the same this method assumes no correlation between features…? What am I misunderstanding here please?

hi @shahin ,

To explain my understanding, let me give my example. Assume there are 2 features A and B. If only permuted feature A and model performance is 40%. If only permuted feature B, the model performance is 45%. But if both features are not permuted, the model performance is 90%. So in this example, the most important feature is A, then B. But the performance reaches its peak only with both features, so there is some interaction between features which make the model reach its peak. A and B is not correlated here, since if A is correlated with B then we only need one of them and the model cannot have performance boost. The word interaction does not say anything about the relation between two features.

Hope it helps,
Cuong

1 Like

Ok, that makes sense thanks.
So, … if the effects of A and B on prediction accuracy are cumulative, then PFI helps you to see that, however this doesn’t mean A and B necessarily “interact” (e.g. colour of car and top speed) and it doesn’t relate to whether they correlate with each other or not. However. PFI performs much better when they don’t correlate.