C1_W2_Lab05_Sklearn how to predict a house price?

evoalg · July 5, 2022, 6:36am

At the end of C1_W2_Lab05_Sklearn, w_norm and b_norm are found by scikit, and predictions are tested on X_norm data to see that they are close to the y targets, and the lab finished there, but if we want to use w_norm & b_norm to make a new house price prediction, we need to first scale the new house data - how do we do that with sklearn?

What I mean is, in a previous lab, we found w_norm & b_norm using gradient descent and then we wanted to predict the price of a house with 1200 sqft, 3 bedrooms, 1 floor, 40 years old, so we did:

x_house = np.array([1200, 3, 1, 40])
x_house_norm = (x_house - X_mu) / X_sigma
x_house_predict = np.dot(x_house_norm, w_norm) + b_norm
… but with sklearn how do we know what X_mu & X_sigma are?

Elemento · July 5, 2022, 6:58am

Hey @evoalg,
Welcome to the community. The great thing about sklearn is that you don’t even need to know the mu and sigma values. In the lab, we have used StandardScaler from the sklearn library. It offers a method known as transform, which you can check out here. You just need to call this method on your test data and this will transform your test data with the mu and sigma values that the function calculated with the help of train data.

Also, if in some case, you require the mu and sigma values, you can always use the mean_ and var_ attributes of the StandardScaler function, once you have called the fit or fit_transform method on your training data, i.e., once you have found the parameters for normalizing your data. I hope this helps.

Regards,
Elemento

evoalg · July 5, 2022, 11:20am

Thank you Elemento … with your help I think I was able to figure it out!

x_house = np.array([1200, 3, 1, 40]) # shape is (4,)
X_house = x_house.reshape(1, -1) # has to be a matrix for sklearn, now shape is (1, 4)
print(f’{X_house=}‘)
X_house_norm = scaler.transform(X_house) # now scaled values!
print(f’{X_house_norm=}')
X_house_predict = sgdr.predict(X_house_norm)
print(f"Predicted price of a house with 1200 sqft, 3 bedrooms, 1 floor, 40 years old = ${X_house_predict[0]*1000:0.0f}")

Also I just noticed, the w & b values that sklearn comes up with changes slights each time I run it (and so each time the predicted house sale price changes slightly), and I’m supposing it uses some randomness somewhere in there (maybe the initial w & b values are started at a random place?).

Elemento · July 5, 2022, 4:01pm

Hey @evoalg,
The initial w and b values are same for each time you use it, as far as I can see from the source code of sklearn’s SGDRegressor function, unless and until, you define them differently each time you call the function. For instance, when you use the fit method of this class, you can set the initial values of the weight vector and the bias using the coef_init and intercept_init parameters, and if you do not define them here, they will be initialized as zeros.

For validating this, you can click on the following links and find it out for yourself:

class SGDRegressor, now, this class inherits from another class
class BadeSGDRegressor, so, when we call the fit method, the fit method of this class is called,
fit method, this method calls another private method,
_fit method, (assuming we are not using warm start), this method will call yet another private method,
_partial_fit method, and this will call yet another private method,
_allocate_parameter_mem method, where the values are initialized as zeros.

Now, the question is “So where is the difference created?”. I guess it depends on the other stochastic elements of this class, for instance the shuffle and random_state parameters, and the values that you use for these parameters. To validate this, you can run a small experiment in which you can fix the values for these parameters, and try to see if you get the exact same results or not. If you get the exact some, then we can conclude that there is no other element of stochasticity in this class, otherwise, there is one, and feel free to go through the source code to find that element

Personally, I never cared about the fact if the results are same or not every time the model runs, because once a model is trained, you can always save the model and load the model at the time of inference. I hope this helps.

Regards,
Elemento

Topic		Replies	Views
MLS C1 W2 Lab 5 question about normalization (moderator edit) Supervised ML: Regression and Classification	5	278	December 28, 2023
Prediction with scaled values (multiply by 1000 instead of rescaling?) Supervised ML: Regression and Classification week-2	3	493	December 17, 2022
C1_W2_Lab03_Feature_Scaling_and_Learning_Rate_Soln - predict value Supervised ML: Regression and Classification week-2	2	509	July 25, 2022
Error Code/Model Representation/ Supervised Machine Learning:Regression & Classification Supervised ML: Regression and Classification week-1	4	51	August 8, 2024
How to use SGDRegressor to get prediction for a specific input? Supervised ML: Regression and Classification week-2	3	499	January 5, 2023

C1_W2_Lab05_Sklearn how to predict a house price?

Related topics