Hello Shilpi @Shilpi_Kumar ,
Welcome to the community!
If we one-hot encode a BINARY variable, then the 2 resulting features are perfectly uncorrelated. We don’t want that, and we actually don’t need to one-hot encode a binary variable.
If we one-hot encode a variable of 3 or more classes, given that the variable isn’t using only 1 category value, then the correlation shouldn’t be perfect. However, they can have non-zero correlations.
Now comes some discussions when two features are somewhat correlated.
Can we ignore some of the (non-perfectly) correlated features?
Not easily.
Even if we speak about numerical features, they are often correlated to some degree. Leaving out one of them needs justification more than just the correlation value. Do we know clearly the causal relationship between the one that is left out and the others that remain?
Also, correlated features are often, if not always, the case.
Will having colinear features reduces the model’s performance?
Generally speaking, under the right choices of hyperparameters, it shouldn’t.
Consider two perfectly correlated features x_1 and x_2 in a linear regression where weights w_1 and w_2 are assigned to them. We can easily see that w_1x_1 + w_2x_2 = (w_1 + w_2)x_1 = w_1'x_1 which is reducible to only using one of the two features. However, this gives rise to the problem of interpreting w_1 and w_2. Since any combination of them will give the same w_1' = w_1 + w_2, such as that (w_1=1, w_2=2) and (w_1=-10, w_2=13) are equally good, we can’t talk precisely about the individual features of x_1 and x_2. Similar effect will happen on partially correlated features.
However, if we have 1000 features in total, but 999 of them are perfectly correlated, this may be harmful to a gradient-boosted decision tree which enabled feature selection. Because a feature is selected by chance, those perfectly correlated features can overwhelm all the trees, leaving the single and different last feature plays little or no role in the final model.
Therefore, if we know two features are perfectly correlated, then we will keep only one. Otherwise, which is the case for one-hot encoded features, we will use them with care.
Cheers,
Raymond