Week 2 lab 3 y_train

flyunicorn · March 29, 2025, 8:08am

These two sentences (in green) seem contradictory. One says the plot uses original feature value, the other says it uses normalized feature value
In this example, we normazlied all input features to make sure they all are in similar scale. But why don’t we normalize target value (y_train) too to make sure it’s also in similar scale?

Alireza_Saei · March 29, 2025, 12:59pm

You are correct and thanks for you attention to details. It can be seen that the plots are original feature values and not the normalized one. I think that the aim of second sentence was to mention that normalized data are used to train model and prediction.

Input features (X_train) are typically normalized to ensure that all features contribute appropriately to the learning process. However, the target variable is the actual value the model is trying to predict. I mean you can normalize y_train, it’s often unnecessary and less direct.

Hope it helps! Feel free to ask if you need further assistance.

pastorsoto · March 29, 2025, 1:05pm

Hi @flyunicorn great question!

The target values are not normalized since it might introduce bias (you’ll introduce data into the target which is a bias), usually you leave the targets unchanged to train your models so it learns the real behavior of the data.

The result of your machine learning model creates an equation to predict the price of the house, so you don’t need to normalize unless you’re trying to predict the price change or something similar which is not the case here.

You are correct, in this graph, the predictions are made using normalized features while the plot shows the original values.

I hope this helps!

Prashant_Upadhyaya · March 29, 2025, 5:47pm

Let Break the codes

Predict target using normalized features

m = X_norm.shape[0]
yp = np.zeros(m)
for i in range(m):
yp[i] = np.dot(X_norm[i], w_norm) + b_norm

Note: This confirms that the predictions (yp) are made using X_norm (normalized features).

Now come to second one

Plot predictions and targets versus original features

fig, ax = plt.subplots(1,4,figsize=(12,3), sharey=True)
for i in range(len(ax)):
ax[i].scatter(X_train[:, i], y_train, label=‘target’) # Original feature values
ax[i].set_xlabel(x_features[i])
ax[i].scatter(X_train[:, i], yp, color=dlc[“dlorange”], label=‘predict’)

Here’s the key observation:

X_train[:, i] is used for the x-axis → This suggests the original feature values are being used.
yp (predictions) are plotted against X_train[:, i] (original features) → Potential mismatch since yp was computed using X_norm!

Contradiction:

The first green-highlighted statement says the plot uses original feature values (which is correct based on X_train in the scatter plots).
The second green-highlighted statement says the When generating the plot, uses normalized features, which appears incorrect when reading but I think they are referring to normalized feature in term of prediction.

The sentence should instead say:

“When generating predictions, normalized features are used, but the plot is shown using original feature values.”

Hopes it helps.

rmwkwok · April 1, 2025, 2:09am

Hi, @flyunicorn,

I would just like to offer a different angle on your question above.

First, we need to remember the reason for normalizing the features is for faster convergence. We use only one learning rate for all weights, so we want all the features to be in one similar scale. If you are not sure about this part, you might want to go back to the lecture for Andrew’s explanations.

Now, when considering the need for normalizing the labels, if we rearrange our linear model in the way below, we see that the bias and the weight can be trained to take the places of normalization parameters (i.e. mean and standard derivation, respectively).

It may take some time for you to digest the relation between the final equation above and normalizing the label, but if you get it, you will see that,

if you don’t normalize, your bias will be trained to equal to the mean of labels,
if you don’t normalize, all of your weights will be scaled up/down equally by the factor of s. In other words, if your labels spread over a large range, without normalization, your weights will all be amplified by the same amount.

Therefore, even you don’t normalize the labels, the training does the rest for you. Even if you don’t normalize the labels, you won’t get into the problem that, as examplified in the lecture, the features can give you. If you don’t normalize the labels, and your labels spread wide, and you use a small learning rate, and your initial weights start small, a possible minor issue is that it may take some more iterations for it to get to the optimal weights.

Cheers,
Raymond

flyunicorn · April 2, 2025, 9:23am

@rmwkwok This is a great insight! Thanks Raymond!

Topic		Replies	Views
Can someone help explain this line Supervised ML: Regression and Classification week-module-2	8	432	July 27, 2023
C1_W2_Lab03_Predicting targets with normalised values Unsupervised Learning, Recommenders, Reinforcement week-module-2	4	250	January 24, 2024
Question about feature scaling Supervised ML: Regression and Classification week-module-2	5	36	August 17, 2024
C1_W2_Lab03_Feature_Scaling_and_Learning_Rate_Soln - predict value Supervised ML: Regression and Classification week-module-2	2	509	July 25, 2022
Is it necessary to normalize target in training data? Machine Learning Specialization	5	526	July 6, 2022

Week 2 lab 3 y_train

Predict target using normalized features

Plot predictions and targets versus original features

Related topics