Query regarding week 1 gradient decent algorithm efficiency

In the context of gradient descent and the “Learning rate” lecture, how should we think about efficiency and convergence when the loss surface has a large gap between a poor local minimum and the global minimum, and there are many inflection/saddle points in between?

At this point in your ML studies, you can safely assume that all cost functions are convex, and have no local minima.

The rare situations where this isn’t true will be discussed later in the course.

It’s a good question, but I recommend you not be concerned about it during Week 1.