Hello,
I recently finished Course 1 in the new ML specialization by Andrew Ng. I decided to download a dataset from Kaggle and try to write my very own program to train a Multiple Linear Regression model to the data. The set contain 30 input features and around 1460 samples. I did normalize the feature data using Z-Score Normalization and everything works fine.

The only issue in my code is that the cost is decreasing but HUGE in value. When calculating the cost function, you have to find the sum of squared error between the predicted values and the target values. Remember that I normalize my input features so the predicted values are very SMALL compared to the target values (predicted in 0.___ in value and target in thousands) and when finding the sum of squared error the value will be very big thus the cost will be huge.

Now, I did normalize only the input features and left the output targets as they are. I tried to normalize the targets using Z-Score and managed to reduce the cost function significantly. I seen some posts online that forbid the idea of normalizing the output targets. Is it a bad thing to do or is it normal and should be done to reduce my cost function ?

Another question is that when I try to test my model with new unseen sample, should I normalize it first using the same mean and std found from the training set and then apply it to the model ?

Usually we only normalize the features but not the target, and then the bias term should shift the prediction values to the level of the target values. Itâ€™s only unnecessary to normalize the target but not really forbidden or â€ścatastrophicâ€ť to do so, but again, itâ€™s unnecessary so I wouldnâ€™t suggest you it as remedy.

What python package do you use for the model, and did you disable the bias term?

Yes. Treat the test data the same way you treated the training data.

Thanks for your reply, I use google Colab. I attached the cost graph below, it also shows the final cost value, and final W and b values. As you can see, the cost is huge but doing a prediction gave a 0.04% error. I am a bit confused as to why the cost is decreasing but big in value. Should I rescale the features back to normal at some point or not ?

Again, which package are you using for the modeling? Is it sklearn?

Also, what is the range of your target, from what to what?

0.04% error and the training curve look fine to me. The cost is ~603674135 which means very roughly a per-sample error of \sqrt{603674135} = 24569, which is $24,569. As you said the error is 0.04%, this cost value looks pretty reasonable if the target range is in the order of 10^5 or 10^6.

I build it using NumPy and Pandas, I also wrote the cost, gradients, and GDA functions my self. I will do it in sklearn but after I am finished doing it the old fashioned way (I did apply vectorization to speed up GDA). The target values range is (34,900 - 755,000). Can you tell me what is per-sample error and why you took the square root of the final cost value ?

Oh! That sounds great! Happy to know you are implementing it yourself!

Sure! I assumed you were using the squared error which is \frac{1}{m}\sum{(f^{(i)}-y^{(i)})^2}, so this thing essentially has an unit of dollar squared, so by taking the square root, we get â€śsomethingâ€ť that has an unit of dollar. The reason I am doing that is to try to get a very rough idea of what your modelâ€™s error is in terms of dollar. This IS NOT the correct way to get the average error, but this can be a way to get a rough idea. Afterall, that is the only info I have from you, so I am using it, make sense?

Your targets are ranged between the order of 10^4 and 10^5 which is not off my expectation, so I think the cost value you are getting is pretty reasonable

Yes, Iâ€™m using the sum of squared errors to calculate my cost. I did some experiments by generating a random sample from the min and max of each feature in my training set. I calculated the percentage error (estimated - exact / exact) and I got around a -1% error and less sometimes (random generation). What I did is I used the mean value of the y_train and use it as my exact value and got the error I mentioned above.

So from what I understood, my model works pretty good and the cost is something to expect. Is that right ?

1% error is certainly not good because we should compare this 1% to the 0.04% which is from your real samples. However, it makes sense that the random samples donâ€™t perform well, so I am not surprised. (PS: Is the mean value that you use for â€śexactâ€ť equal to ~180000?)

yes that the cost is something expected. As for whether the model is good, if the 0.04% is the error of your cv set (that is never used for training), then your model is pretty awesome!

OK! Just a side note, I could guess the value of your exact because theoretically it should be equal to your bias value and thatâ€™s why I said the following:

Also the 1% is pretty expected because your weight values are in the order of 10^3 to 10^4 and your bias is in the order of 10^6, so your random sample should get an error in the order of 0.1% or 1%, so it makes sense.