Thank you for being patient with me. At last I got it:)
However, it became possible only when I did normalization to zero mean and unit variance. Otherwise the picture is still as before.
Thank you again.
The scale of input data can also affect the cost surface.
Please let me ask you more question.
Why do we use gradient descent if the normal equation can provide the exact solution at one?
Is it the issue with computational cost?
My best regards, Vasyl.
Yes, the normal equation is unsuitable for large datasets because it is computationally complex.
It is also limited to regression problems. For these reasons, we prefer to use gradient descent.
1 Like