Good day,
I decided to write the entire example to practice the Linear Regression better than the optional labs.
I noticed that sometimes the w & b reach Nan or Inf after 100 iterations or less with alpha 0.1e-3, and it was very hard to find a better alpha and iterations for the algorithm.
On the other hand, when I normalized the X (though my data has just one feature with range 200:600), it took 1k iterations with alpha 0.1 to converge.
So my question here, Does the rescaling really affect the convergence accuracy and speed? How and Why? I don’t get it.