Hello,
So, in the lecture, without normalization, the contour plot had an elongated shape. So, during gradient descent, we would proceed in a direction that is perpendicular to the contour lines. However, because of the elongated shape, that direction is unlikely to lead us directly to the minimum of the cost function. Therefore, convergence can take much longer.
However, after regularization, the contour map becomes circular. Gradient descent will still take us in a direction that is perpendicular to the contour lines, but this time it will directly lead to the minimum.
Is this understanding correct?
Regards,
Divyaman