In the “Using one-hot encoding of categorical features” video, Andrew talks about using one-hot encoding as a way of dealing with multilabel categories and creating k features where k is the number of labels.
I recall from one-hot encoding for regressions that we’d generally use k-1 features to prevent multicollinearity. Is multicollinearity something that is not usually considered with decision trees because of the low interaction between the splits?
Sorry, I didn’t express myself clearly - k-1 didn’t come from this course.
IIRC from econometrics classes, it is standard practice for (linear?) regressions to drop one of the dummy variables to avoid the dummy variable trap and subsequent multicollinearity. If k variables are present, then we know that there will always be one with value 1 and the rest 0, alternatively their sum is always 1, which would break the assumption of the independently part in i.i.d.
I assume that multicollinearity with decision trees is not an issue because we generally split on a single dummy variable, but would like to confirm that.
Yes, I agree with you on that - we consider only one feature at a time, and we cannot always know beforehand which feature (among all of the one-hot features) carries the least information that can be best dropped.