Week # must be added in the tags option of the post.
Link to the classroom item you are referring to:
Description (include relevant info but please do not post solution code or your entire notebook):
In video andrew sir describes that a node can have 3 branches, so i’m not sure why would i do transform my data to perform one-hot encoding and increase its size, even when decision tree algorithm can work on string data.
Decision trees are designed to make true/false decisions.
You cannot do that if the target variable is an enumerated set of integers.
So you need a logical output for each label.
Thanks for replying @TMosh
But i do have some follow ups, if u can help to resolve them it would be great.
In my case, the target variable is binary, so that shouldn’t be a concern.
My question is about one of the input features which has 3 distinct classes (e.g., “Red”, “Green”, “Blue”).
Since decision trees can create multi-way splits (e.g., one node branching into 3 based on feature value), why is one-hot encoding necessary? Doesn’t that just increase dimensionality unnecessarily?
Also, for a multi-class target variable, can’t the decision tree simply split as:
Root → Is class A?
├── Yes → Class A
└── No → Is class B?
├── Yes → Class B
└── No → Class C
Wouldn’t that still work without requiring one-hot encoding?