MLS C1 W2 Lab 5 question about normalization (moderator edit)

Arun_N · December 28, 2023, 12:24am

week-2
I am working through the optional lab in week 2 found here: Coursera | Online Courses & Credentials From Top Educators. Join for Free | Coursera

Towards to end of the code, when making predictions, the code uses the normalized training data to make predictions. However, it is not clear how I would make a prediction for a new set of data.
For example, if I want to predict the price of 2 houses one with a 1200 sq ft and the other 1800 sq ft, and both having 2 bedrooms, 2 floors and 1 year old how would I pass this data to the model to predict the values?

If I passed the data as is (without attempting to normalize it), the predicted prices are 132259.25 & 275303.55. I was expecting something like 132 and 275 since the input y_train seems to be in that range (read as thousands). If I have to normalize before passing it in, how would I go about normalizing my data?

my_data = [[1200, 2, 2, 1], [2500, 2, 2, 1]] #I have the sq. ft, bedrooms, floors, year for one house that I want to predict.
print(f"My home prices: {sgdr.predict(my_data)}")

rmwkwok · December 28, 2023, 1:17am

Hello @Arun_N,

What normalization constants did the training set use to normalize the training data?

Raymond

TMosh · December 28, 2023, 3:01am

You would apply the same normalization to the features of the new prediction as were returned by the normalization function.

Note: I updated the thread title to be more descriptive (course name, week number, lab number, short description of the issue).

TMosh · December 28, 2023, 3:07am

Update: The lab uses scikit’s “StandardScalar” preprocessor.

The documentation for this function provides a number of additional methods.

Arun_N · December 28, 2023, 12:30pm

Thanks for the replies. Looking at the documentation I realize there are two methods: fit_transform() and a transform().
fit_transform() is used on the training set to arrive at the normalization constants and later the transform() can be used on the new data. I modified my code as shown below and looks like I am getting the right answers. Thank you both!

my_data = [[1200, 2, 2, 1], [1400, 2, 2, 1], [1600, 2, 2, 1], [2600, 2, 2, 1]]
print(f"My normalized data: {scaler.transform(my_data)}“)
print(f"My prediction using normalized data:{sgdr.predict(scaler.transform(my_data))}”)

o/p:
My prediction using normalized data: [253.13 286.99 320.84 490.13]

rmwkwok · December 28, 2023, 1:48pm

Great work!

Topic		Replies	Views
Can someone help explain this line Supervised ML: Regression and Classification week-module-2	8	433	July 27, 2023
C1_W2_Lab03_Feature_Scaling_and_Learning_Rate_Soln - predict value Supervised ML: Regression and Classification week-module-2	2	510	July 25, 2022
Feature Scaling and learning rate lab Supervised ML: Regression and Classification week-module-2	3	373	August 23, 2023
C1_W2_Lab05_Sklearn how to predict a house price? Supervised ML: Regression and Classification week-module-2	3	532	July 5, 2022
C1_W2_Lab03_Feature_Scaling_and_Learning_Rate_Soln - normalizing the testing data Supervised ML: Regression and Classification week-module-2	6	518	July 14, 2022

MLS C1 W2 Lab 5 question about normalization (moderator edit)

Related topics