Why applying one-hot encoding

francesco4203 · December 4, 2023, 9:51am

Hi,
I have two questions regarding one-hot encoding, please help me understand them:
1/ One-hot encoding is used for categorical features in, for example regression model or neural network, to make their numerical value more meaningful. For example if the feature cat’s ear has 3 types of shapes and we numbered it like 1, 2, 3, it will not contribute to the model the accurate meaning (it can be something like the ear type 2 will contribute to the price of the cat twice as the type 1, but it does not). Do I understand those correctly?
2/ In decision tree, why do we need to apply one-hot encoding, instead of just split the tree to the same number of categories? Is it because that there is a problem with calculating the entropy, or something else?
Thank you!

rmwkwok · December 4, 2023, 9:59am

Hi @francesco4203

Yes, we can speak about the contribution of a feature in a given model by its weight, but I must emphasize that any statement is bounded to the model, but not as a general truth.

The problem is the same as we have in a linear regression model - the arbitary ranking - 3 different shapes may result in 6 different rankings. Obviously, a different ranking can result in a different decision tree model of a different performance. How can we get rid of this uncertainty? One-hot encoding.

So you are saying if there are 10 categorical values, then it split the feature into 10 sub-branches?

What do you think?

Cheers.
Raymond

francesco4203 · December 4, 2023, 10:28am

Oh.
So it’s something like, for example, there are 10 shaped of the animal ears, and most cats have ear of shaped 1, then deciding if an animal is a cat or not should be based on whether its ear is of shaped 1 or not, and the rest 9 shaped is not that important for the decision, is it correct?

rmwkwok · December 4, 2023, 10:40am

How did you come to that? I don’t quite follow it. Anything in my last reply that leads you there?

francesco4203 · December 4, 2023, 10:53am

You were saying about the ranking, so I understand that different categories of the same feature might have different contributions to the decision, like splitting using one category might results in lower entropy than the others.

rmwkwok · December 4, 2023, 10:57am

I talked about the ranking, and said that, with 3 categorical values, there can be 6 rankings. Let’s say we have square, round, and triangular shapes and we label them as S, R, and T respectively.

The 6 rankings are:

S R T
S T R
R S T
R T S
T R S
T S R

In the first ranking, we are implying that S < R < T because S is 0, R is 1, and T is 2.

This (S < R < T) has an implication to the linear regression model and to the decision tree’s splitting algorithm.

We can ask ourselves: how do we pick one out of the above 6 possibilities? Are we aware that we are actually making a pick at all?

Different rankings have different implications.

To get rid of these differences, we one-hot-encode.

Above is all I wanted to say. As for whether a shape is important or not, the optimization of our linear model/decision tree will decide

francesco4203 · December 4, 2023, 11:06am

Oh I understand that now. Thank you!
But still, in the decision tree, if it split the feature into 10 sub-branches, what is the problem? Isn’t it eventually split into the total 10 sub-branches if we sequentially split 10 new features that are one-hot encoded?
Then I understand that the problem is that it might not gonna split the whole 10 features but might just choose some of them during the process to split. Is it correct?

rmwkwok · December 4, 2023, 11:08am

Yes. Splitting into 10 may not be optimal. Decision tree only splits but do not re-group. If we only split one into two each time, then we have the freedom to finally end up with 2, 3, 4, or 10 branches, whichever optimal.

francesco4203 · December 4, 2023, 11:11am

I completely understand now.
I’m really appreciate for your help, have a nice day

rmwkwok · December 4, 2023, 11:11am

You are welcome! Since your questions are cleared, you may want to read about Optimal Partitioning and Target Encoding. They allow you to not one-hot encode a categorical feature, but be careful that they have their own pros and cons.

Topic		Replies	Views
Using one-hot encoding of categorical features Advanced Learning Algorithms week-4	5	338	March 12, 2024
What counts as one-hot encoding Advanced Learning Algorithms week-4	2	337	November 24, 2023
Isn't it a BAD idea to use one-hot encode for Decision Tree models? Advanced Learning Algorithms week-4	6	1896	December 1, 2022
Decision trees, one-hot encoding, and multicollinearity Advanced Learning Algorithms week-4	6	311	February 13, 2024
Does it make sense to call it one-hot encoding of something that has only two values? Advanced Learning Algorithms week-4	3	532	June 21, 2022

Why applying one-hot encoding

Related topics