C3M4_Assignment confusion

jacko99 · July 21, 2025, 9:11am

Hello,

I have a question regarding the correct way to handle one-hot encoding for new prediction data, specifically in Step 6 of our process.

1.This image shows the raw test in test_cars.csv using for prediction. noted the Fuel type column: the first three records have type ‘X’, and the fourth has type ‘Z’.

2.This image shows the result of X_test_multi after this raw data is run through the prepared preprocessing pipeline. The issue is in the Fuel_type_X column. It is incorrectly populated with all zeros. Given that the input Fuel type for the first three rows is ‘X’, this column should have a value of 1 in those rows.

This seems generating an unreliable predictions ?

Pls help to clarify me on it , Thanks~

Girijesh · July 21, 2025, 9:34am

Dear @jacko99,

You’re right, the predictions are unreliable because the one-hot encoding is incorrect.

Issue:

Fuel type ‘X’ appears in rows 1–3, but Fuel_type_X is 0 for all.
This likely happened because the encoder used during training wasn’t reused for the test data.

Fix:

Fit encoder on training data only, and reuse it for test data.
Use OneHotEncoder(handle_unknown='ignore').
Wrap preprocessing and model in a single pipeline to ensure consistency.

Let me know if you need help fixing the code.

rmwkwok · July 21, 2025, 9:43am

Hello, @jacko99,

Nice catch. The reason behind is that, when we call pd.get_dummies, we set the parameter drop_first=True, and because the test set has only X and Z as the values, the column for X was dropped, but then we call fuel_type_dummies_test.reindex to add it back with its value filled with 0.

I agree, and I think that’s a bug.

@jacko99,

As far as this assignment is concerned, I believe this shouldn’t fail your submission but if it did, please let us know here.
In practice, it would be better to use alternatives like sklearn.preprocessing.OneHotEncoder, or we still use pd.get_dummies but do the dropping ourselves.

@chris.favila, please take a look at this case.

Cheers,
Raymond

rmwkwok · July 21, 2025, 9:51am

@jacko99, I have filed a report to the course team. Thanks for sharing this.

jacko99 · July 21, 2025, 12:39pm

Dear @Girijesh

Thank you for the quick and clear confirmation. That makes perfect sense.

I’ve already updated the code to reuse the same fitted OneHotEncoder, and I’m now getting the prediction results.

Appreciate the guidance!

jacko99 · July 21, 2025, 12:43pm

Dear @rmwkwok

Thank you for the detailed explanation. That clarifies why the re-indexing caused the bug.

I can confirm that I have now used sklearn.preprocessing.OneHotEncoder as you suggested, and it gives me the predictions.

Thanks for your help!

rmwkwok · July 21, 2025, 1:03pm

You are welcome, @jacko99!

Cheers,
Raymond

Girijesh · July 21, 2025, 1:35pm

Great work @jacko99. Keep learning.

Thank you!

chris.favila · July 24, 2025, 7:07am

Hi, and thank you for reporting! Good catch! The code for Step 6 has been updated for new learners taking the assignment.

Topic		Replies	Views
DLS COURSE 2 WEEK3 Exercise 3 Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	549	June 27, 2022
One-hot-encoding using pandas , getting different numbers of columns for train and test sets AI Discussions	4	95	April 6, 2023
Course 2, Week 3 : 2.3 Using One Hot Encodings Exercise 3 Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	513	November 22, 2021
Flaw in the directions for exercise 1 AI for Medical Prognosis week-module-4	2	552	June 29, 2022
Week 1 Dinosaur Island : sample Step 4 : representing x as one-hot vector corresponding to character prediction Sequence Models coursera-platform	7	896	June 10, 2023

C3M4_Assignment confusion

Issue:

Fix:

Related topics