We know that we need to determine multiple initial points if we deal with a function that has multiple maximum or minimum points to have a better approach to the global one. Is it also true in Newthon’s method?

Typically we’re only concerned with functions that have only one minimum (and the maxima are at +Infinity). We prefer cost functions that are convex, because they have a unique solution.

Newton’s method is intended to find a function’s roots (where it’s y value is zero). For this we only need to start at one point, and use its derivative at that point, and proceed iteratively from there.

So, what things should I consider when I want to choose either gradient descent or Newton’s method to overcome some problems?

If you’re trying to minimize a cost function, use gradient descent.

I really don’t know why they put Newton’s method into this course - maybe mathematicians find it fun and interesting. It is a useful example of an iterative method, maybe that’s why it’s here.

I’ve never had occasion to use Newton’s method directly in any machine learning situation.