Why one hot encoding is needed?

aditya_rai1 · June 1, 2025, 3:53pm

When creating a post, please add:

Week # must be added in the tags option of the post.
Link to the classroom item you are referring to:
Description (include relevant info but please do not post solution code or your entire notebook):

In video andrew sir describes that a node can have 3 branches, so i’m not sure why would i do transform my data to perform one-hot encoding and increase its size, even when decision tree algorithm can work on string data.

TMosh · June 1, 2025, 5:23pm

Decision trees are designed to make true/false decisions.
You cannot do that if the target variable is an enumerated set of integers.
So you need a logical output for each label.

aditya_rai1 · June 1, 2025, 6:10pm

Thanks for replying @TMosh
But i do have some follow ups, if u can help to resolve them it would be great.

In my case, the target variable is binary, so that shouldn’t be a concern.

My question is about one of the input features which has 3 distinct classes (e.g., “Red”, “Green”, “Blue”).

Since decision trees can create multi-way splits (e.g., one node branching into 3 based on feature value), why is one-hot encoding necessary? Doesn’t that just increase dimensionality unnecessarily?

Also, for a multi-class target variable, can’t the decision tree simply split as:

Root → Is class A?
   ├── Yes → Class A
   └── No → Is class B?
           ├── Yes → Class B
           └── No → Class C

Wouldn’t that still work without requiring one-hot encoding?

TMosh · June 1, 2025, 11:16pm

Without one-hot coding, the model may unintentionally learn some implied linear relationship among the values of that feature.

For a simple example, if you have a “lifeform” feature, and the candidates are 1=amoeba, 2 = elephant, 3 = bacteria".

The sequence of values would imply that amoeba is more closely related to an elephant than to a bacteria.

Topic		Replies	Views
Using one-hot encoding of categorical features Advanced Learning Algorithms week-module-4	5	345	March 12, 2024
Why applying one-hot encoding Advanced Learning Algorithms week-module-4	9	302	December 4, 2023
One hit encoding Decision Tree Data Preprocessing data Advanced Learning Algorithms week-module-3	1	392	November 15, 2023
Decision trees, one-hot encoding, and multicollinearity Advanced Learning Algorithms week-module-4	6	318	February 13, 2024
One-hot encoding vs. n-ary decision trees Advanced Learning Algorithms week-module-4	0	298	November 5, 2023

Why one hot encoding is needed?

Related topics