In the convergence of gradient descent lecture in week 2, Andrew speaks about convergence to global minima. I am confused since in earlier lectures he had said that we will find local minima and since at local minima the derivative term is zero, the parameters wont change afterwards.
can anybody explain it if i have misunderstood any part?