MLS C1 W2 Lab 5 question about normalization (moderator edit)

week-2
I am working through the optional lab in week 2 found here: Coursera | Online Courses & Credentials From Top Educators. Join for Free | Coursera

Towards to end of the code, when making predictions, the code uses the normalized training data to make predictions. However, it is not clear how I would make a prediction for a new set of data.
For example, if I want to predict the price of 2 houses one with a 1200 sq ft and the other 1800 sq ft, and both having 2 bedrooms, 2 floors and 1 year old how would I pass this data to the model to predict the values?

If I passed the data as is (without attempting to normalize it), the predicted prices are 132259.25 & 275303.55. I was expecting something like 132 and 275 since the input y_train seems to be in that range (read as thousands). If I have to normalize before passing it in, how would I go about normalizing my data?

my_data = [[1200, 2, 2, 1], [2500, 2, 2, 1]] #I have the sq. ft, bedrooms, floors, year for one house that I want to predict.
print(f"My home prices: {sgdr.predict(my_data)}")

Hello @Arun_N,

What normalization constants did the training set use to normalize the training data?

Raymond

You would apply the same normalization to the features of the new prediction as were returned by the normalization function.

Note: I updated the thread title to be more descriptive (course name, week number, lab number, short description of the issue).

Update: The lab uses scikit’s “StandardScalar” preprocessor.

The documentation for this function provides a number of additional methods.

Thanks for the replies. Looking at the documentation I realize there are two methods: fit_transform() and a transform().
fit_transform() is used on the training set to arrive at the normalization constants and later the transform() can be used on the new data. I modified my code as shown below and looks like I am getting the right answers. Thank you both!

my_data = [[1200, 2, 2, 1], [1400, 2, 2, 1], [1600, 2, 2, 1], [2600, 2, 2, 1]]
print(f"My normalized data: {scaler.transform(my_data)}“)
print(f"My prediction using normalized data:{sgdr.predict(scaler.transform(my_data))}”)

o/p:
My prediction using normalized data: [253.13 286.99 320.84 490.13]

Great work!