C1_W2_Lab05_Sklearn how to predict a house price?

At the end of C1_W2_Lab05_Sklearn, w_norm and b_norm are found by scikit, and predictions are tested on X_norm data to see that they are close to the y targets, and the lab finished there, but if we want to use w_norm & b_norm to make a new house price prediction, we need to first scale the new house data - how do we do that with sklearn?

What I mean is, in a previous lab, we found w_norm & b_norm using gradient descent and then we wanted to predict the price of a house with 1200 sqft, 3 bedrooms, 1 floor, 40 years old, so we did:

x_house = np.array([1200, 3, 1, 40])
x_house_norm = (x_house - X_mu) / X_sigma
x_house_predict = np.dot(x_house_norm, w_norm) + b_norm
… but with sklearn how do we know what X_mu & X_sigma are?

Hey @evoalg,
Welcome to the community. The great thing about sklearn is that you don’t even need to know the mu and sigma values. In the lab, we have used StandardScaler from the sklearn library. It offers a method known as transform, which you can check out here. You just need to call this method on your test data and this will transform your test data with the mu and sigma values that the function calculated with the help of train data.

Also, if in some case, you require the mu and sigma values, you can always use the mean_ and var_ attributes of the StandardScaler function, once you have called the fit or fit_transform method on your training data, i.e., once you have found the parameters for normalizing your data. I hope this helps.

Regards,
Elemento

1 Like

Thank you Elemento … with your help I think I was able to figure it out!

x_house = np.array([1200, 3, 1, 40]) # shape is (4,)
X_house = x_house.reshape(1, -1) # has to be a matrix for sklearn, now shape is (1, 4)
print(f’{X_house=}‘)
X_house_norm = scaler.transform(X_house) # now scaled values!
print(f’{X_house_norm=}')
X_house_predict = sgdr.predict(X_house_norm)
print(f"Predicted price of a house with 1200 sqft, 3 bedrooms, 1 floor, 40 years old = ${X_house_predict[0]*1000:0.0f}")

Also I just noticed, the w & b values that sklearn comes up with changes slights each time I run it (and so each time the predicted house sale price changes slightly), and I’m supposing it uses some randomness somewhere in there (maybe the initial w & b values are started at a random place?).

Hey @evoalg,
The initial w and b values are same for each time you use it, as far as I can see from the source code of sklearn’s SGDRegressor function, unless and until, you define them differently each time you call the function. For instance, when you use the fit method of this class, you can set the initial values of the weight vector and the bias using the coef_init and intercept_init parameters, and if you do not define them here, they will be initialized as zeros.

For validating this, you can click on the following links and find it out for yourself:

Now, the question is “So where is the difference created?”. I guess it depends on the other stochastic elements of this class, for instance the shuffle and random_state parameters, and the values that you use for these parameters. To validate this, you can run a small experiment in which you can fix the values for these parameters, and try to see if you get the exact same results or not. If you get the exact some, then we can conclude that there is no other element of stochasticity in this class, otherwise, there is one, and feel free to go through the source code to find that element :nerd_face:

Personally, I never cared about the fact if the results are same or not every time the model runs, because once a model is trained, you can always save the model and load the model at the time of inference. I hope this helps.

Regards,
Elemento

1 Like