Can someone help explain this line

Abtaal_Aatif · July 26, 2023, 4:06pm

This line is mentioned in Week 2’s optional lab for feature scaling and learning rate: “when generating the plot, the normalized features were used. Any predictions using the parameters learned from a normalized training set must also be normalized.”

Does this mean that the value our model predicts will need to be de-normalised? Because, after this quote, in the lab, it lets us predict a price for the house and it does not show any de-normalization.

paulinpaloalto · July 26, 2023, 4:30pm

No, I think the point is that if the model is trained on normalized data, then when you want to make a prediction with new data, that input data needs to be normalized in the same way. The point is that the model is only “understands” normalized data, so that’s all it can deal with as input. For example, if you are doing “mean normalization” and predicting on a single sample, it doesn’t make sense to use the “mean” of a single sample. You have to save the normalization parameters (\mu and \sigma) from the training set.

I should make the disclaimer that I have not taken MLS, so I’m not sure what Prof Ng says in the lectures there. I would hope that he said something about this and that it would be along the lines of what I said above.

Abtaal_Aatif · July 26, 2023, 4:42pm

There wasn’t much mention of it in the actual lectures, but as stated earlier it was mentioned in a lab. My current interpretation is that if we normalize the training set (not just features), in that case, we would need to de-normalize the prediction because it was trained with normalized y-values.

paulinpaloalto · July 26, 2023, 4:53pm

No, the predictions are the predictions, right? What would it mean to “denormalize” them? You don’t normalize the y values, right? Those are just prices in dollars in this case.

The point of normalization is that it affects only the input feature data and the purpose of it is that it makes the training work better (converge faster). Notice that you have wildly divergent values for the different features (parameters): number of bedrooms is a number between 1 and 6, whereas square feet is a number in the range of hundreds to thousands, right? That makes the solution surfaces have very steep slopes in some dimensions and shallow in others, which makes it hard for Gradient Descent to converge efficiently. Normalizing all the input features to have (say) \mu = 0 and \sigma = 1 gives a much better behaved surface and easier (in most cases) convergence.

Abtaal_Aatif · July 26, 2023, 5:16pm

Yes, I agree with you. What I meant is that the statement refers to general cases where people may have normalized the target data as well before fitting a model. The statement mentioned does not refer to the specific example they demonstrated in the lab.

So, if we were to normalize the “training set” as a whole, only then would we need to de-normalize the prediction. Else, if we only normalized the features, and not the targets, then we would not need to de-normalize the output.

For all intents and purposes, I agree that we need not normalize the target values in the first place (and thus, not need to de-normalize the output) as demonstrated in the lab.

paulinpaloalto · July 26, 2023, 7:44pm

I have never seen a case in which the target (label) values or the output of a network was normalized. You apply an output activation function (e.g. sigmoid in the case of a binary classification) and then a cost function (cross entropy for a classification or MSE for a regression problem). But you never “normalize” the output. At least I’ve never seen an instance of that. If you have seen references to that, please give us a link so that I can investigate further.

Abtaal_Aatif · July 27, 2023, 1:32am

No, I have not seen any such case. I just have this statement to go off which seems to say “Any predictions using the parameters learned from a normalized training set must also be normalized.”.

If this wasn’t even mentioned, i would have assumed the result was the actual price of the house (since we didn’t normalise the target value, as you previously mentioned).

Maybe it meant that if we were to plot price against a normalised feature, we would need to normalise the price in order to make a good plot(?)

paulinpaloalto · July 27, 2023, 3:38am

I think you are just misinterpreting that statement. I explained what they really meant in my first response on this thread. You only need to normalize the features in order to make the prediction, because the model only is trained on normalized feature data.

Abtaal_Aatif · July 27, 2023, 6:28am

Okay I think I now understand it perfectly. Thank you for the help and keeping up with this query

Topic		Replies	Views
Week 2 lab 3 y_train Supervised ML: Regression and Classification week-2	5	47	April 2, 2025
C1_W2_Lab03_Predicting targets with normalised values Unsupervised Learning, Recommenders, Reinforcement week-2	4	249	January 24, 2024
C1_W2_Lab03_Feature_Scaling_and_Learning_Rate_Soln - normalizing the testing data Supervised ML: Regression and Classification week-2	6	518	July 14, 2022
Feature Scaling and learning rate lab Supervised ML: Regression and Classification week-2	3	372	August 23, 2023
Question about feature scaling Supervised ML: Regression and Classification week-2	5	33	August 17, 2024

Can someone help explain this line

Related topics