Handling missing data (NAN) before applying Decision tree models

Sumaiya639 · February 9, 2024, 11:48pm

Hello everyone,
I’m working on a project and want your ideas on dealing with NAN values before applying Tree Ensemble for classification problems.
Some options are,
Just drop rows have NAN values
Univariate Imputation
Multivariate Imputation
What do you suggest?

Thanks

TMosh · February 10, 2024, 1:18am

None of them are very good solutions, compared to having a complete data set.
They all work relatively badly, in different ways.

I think the choice depends on how much data you’re missing. You make a different decision about whether to throw away (N-1) features just because 1 is missing, depending on the magnitude of N.

There’s no harm in trying several methods and picking the one that gives the least-worst performance.

Sumaiya639 · February 11, 2024, 12:38am

Tmosh,
Thanks for the reply.
Please guide me, on where to go from here,

Steps I already applied,
Removed columns having more than 50% of rows are null.
Changed categorical columns using (get. Dummies)

When used this,
model = DecisionTreeClassifier(min_samples_split = min_samples_split,
random_state = RANDOM_STATE).fit(X_train,y_train)

Error

TMosh · February 11, 2024, 2:35am

There’s a lot of advice in the error message. I don’t have much to add.

TMosh · February 11, 2024, 2:36am

You could try replacing all of the NaN values with the mean value for that feature.

Topic		Replies	Views
How decision tree handles the missing value Supervised ML: Regression and Classification week-module-1	1	495	September 21, 2022
Null values for categorical data set AI Discussions	3	68	January 2, 2023
How do you handle missing values in data? Explain in detail assuming different scenarios Machine Learning in Production	4	629	August 9, 2023
Null Values >30% AI Discussions	5	242	December 28, 2022
General methodology for handling missing data in training examples Supervised ML: Regression and Classification	2	260	July 12, 2022

Handling missing data (NAN) before applying Decision tree models

Related topics