C2 - W4 - Multiple targets

Hello, I’m new to programming, I’m not a computer scientist, I’m studying AI to implement Decision Tree in my area.
I have a question about the targets in my tree. In the course he uses an example that can be a cat or not. In the examples in my area, I have a DataFrame with several columns and the target column has 6 different types of values. I thought about applying OneHotEncoder and creating other columns with values of 1 or 0, passing the columns added by OneHotEncoder to the tree. I also thought about passing a .map through a dictionary that would place values between 0 and 6 in the original DataFrame, but then I would only pass 1 column as the target column. What is the best alternative, to use OneHotEncoder and pass several columns or a single column in which I pass the .map?

2 Likes

Hi @eudesmedeiros,

One-hot encoding preserves distinct categories and is suitable for categorical algorithms, but it increases dimensionality and might lead to potential overfitting.

Mapping to integers reduces dimensionality and simplifies representation, but be cautious as it may result in the loss of distinct category information and potential misinterpretation (e.g., interpreting class 1 as better than class 6 or vice versa).

3 Likes

The big issue with using enumerated integers as classes is that the model is going to think that classes 2 and 3 are more closely related than classes 1 and 4, for example.

Hello @eudesmedeiros,

From your description, your target label has six different values, which means that it is a multi-class classification problem.

To model it, you may use sklearn’s LabelEncoder to convert the classes into integers 0 to 5. Then you can give xgboost / lightgbm / sklearn the converted target to train a decision-tree based model.

Cheers,
Raymond