Week 2 lab 3 y_train

  1. These two sentences (in green) seem contradictory. One says the plot uses original feature value, the other says it uses normalized feature value
  2. In this example, we normazlied all input features to make sure they all are in similar scale. But why don’t we normalize target value (y_train) too to make sure it’s also in similar scale?
1 Like

Hi @flyunicorn

You are correct and thanks for you attention to details. It can be seen that the plots are original feature values and not the normalized one. I think that the aim of second sentence was to mention that normalized data are used to train model and prediction.

Input features (X_train) are typically normalized to ensure that all features contribute appropriately to the learning process. However, the target variable is the actual value the model is trying to predict. I mean you can normalize y_train, it’s often unnecessary and less direct.

Hope it helps! Feel free to ask if you need further assistance.

2 Likes

Hi @flyunicorn great question!

The target values are not normalized since it might introduce bias (you’ll introduce data into the target which is a bias), usually you leave the targets unchanged to train your models so it learns the real behavior of the data.

The result of your machine learning model creates an equation to predict the price of the house, so you don’t need to normalize unless you’re trying to predict the price change or something similar which is not the case here.

You are correct, in this graph, the predictions are made using normalized features while the plot shows the original values.

I hope this helps!

2 Likes

Let Break the codes

Predict target using normalized features

m = X_norm.shape[0]
yp = np.zeros(m)
for i in range(m):
yp[i] = np.dot(X_norm[i], w_norm) + b_norm

Note: This confirms that the predictions (yp) are made using X_norm (normalized features).

Now come to second one

Plot predictions and targets versus original features

fig, ax = plt.subplots(1,4,figsize=(12,3), sharey=True)
for i in range(len(ax)):
ax[i].scatter(X_train[:, i], y_train, label=‘target’) # Original feature values
ax[i].set_xlabel(x_features[i])
ax[i].scatter(X_train[:, i], yp, color=dlc[“dlorange”], label=‘predict’)

Here’s the key observation:

  • X_train[:, i] is used for the x-axis → This suggests the original feature values are being used.
  • yp (predictions) are plotted against X_train[:, i] (original features) → Potential mismatch since yp was computed using X_norm!

Contradiction:

  1. The first green-highlighted statement says the plot uses original feature values (which is correct based on X_train in the scatter plots).
  2. The second green-highlighted statement says the When generating the plot, uses normalized features, which appears incorrect when reading but I think they are referring to normalized feature in term of prediction.

The sentence should instead say:

“When generating predictions, normalized features are used, but the plot is shown using original feature values.”

Hopes it helps.

1 Like

Hi, @flyunicorn,

I would just like to offer a different angle on your question above.

First, we need to remember the reason for normalizing the features is for faster convergence. We use only one learning rate for all weights, so we want all the features to be in one similar scale. If you are not sure about this part, you might want to go back to the lecture for Andrew’s explanations.

Now, when considering the need for normalizing the labels, if we rearrange our linear model in the way below, we see that the bias and the weight can be trained to take the places of normalization parameters (i.e. mean and standard derivation, respectively).

It may take some time for you to digest the relation between the final equation above and normalizing the label, but if you get it, you will see that,

  • if you don’t normalize, your bias will be trained to equal to the mean of labels,
  • if you don’t normalize, all of your weights will be scaled up/down equally by the factor of s. In other words, if your labels spread over a large range, without normalization, your weights will all be amplified by the same amount.

Therefore, even you don’t normalize the labels, the training does the rest for you. Even if you don’t normalize the labels, you won’t get into the problem that, as examplified in the lecture, the features can give you. If you don’t normalize the labels, and your labels spread wide, and you use a small learning rate, and your initial weights start small, a possible minor issue is that it may take some more iterations for it to get to the optimal weights.

Cheers,
Raymond