Convergence of Gradient Descent

In the convergence of gradient descent lecture in week 2, Andrew speaks about convergence to global minima. I am confused since in earlier lectures he had said that we will find local minima and since at local minima the derivative term is zero, the parameters wont change afterwards.

can anybody explain it if i have misunderstood any part?

Normally we prefer to use convex cost functions. That means there is only one minimum, and there aren’t any local minima that can trap the convergence.