The Effect of Feature Rescaling on Convergence

Good day,
I decided to write the entire example to practice the Linear Regression better than the optional labs.

I noticed that sometimes the w & b reach Nan or Inf after 100 iterations or less with alpha 0.1e-3, and it was very hard to find a better alpha and iterations for the algorithm.

On the other hand, when I normalized the X (though my data has just one feature with range 200:600), it took 1k iterations with alpha 0.1 to converge.

So my question here, Does the rescaling really affect the convergence accuracy and speed? How and Why? I don’t get it.

1 Like

Yes, the learning rate is closely related to the magnitudes of the features. This is because the gradients are computed by the product of the errors and X(i). Then you multiply the gradients by the learning rate to get the change in the weight values.

When you normalize the features, this allows you to use a larger learning rate without any individual feature causing the solution to diverge.

1 Like