C1_W2_Lab03_Feature_Scaling_and_Learning_Rate_Soln cost

Comparing at the cost after 10000 iterations with alpha = 1e^-7 between unnormalized and normalized features, I see that cost after normalizing is higher,

run_gradient_descent(X_norm, y_train, 100000, 1.0e-7)

run_gradient_descent(X_train, y_train, 100000, alpha = 1e-7)

Can someone help me understand why this is the case?

The purpose of normalization is that it allows you to use a larger learning rate without the risk of the solution diverging.

Notice in the lab that the learning rate for the normalized data set was 0.1, rather than the 0.0000001 in the non-normalized case.

The combination of the tiny learning rate and the non-normalized features was able to get its cost values simply due to getting lucky that it helped one of the features to give a slightly better cost value.

Did you mean that for this specific training data, it is noticed that non-normalized input is performing better than normalized input, and it is a rare case?

It’s not performing better. You’ve created an unrealistic test by normalizing the features and then using an extremely tiny learning rate.

Yes, the result you got is just pure luck, specific to that set of data.