I asked ChatGPT about the look of the "weight landscape" and it gave good pointers

Well, the conditions under which this is true are very limited: only logistic regression, which Prof Ng points out can be considered to be a trivial Neural Network with just the output layer. Once you go to a neural network with more than one layer, convexity is history. As the thread David linked shows, the number of local minima is huge.

There is also some mathematics which shows that for networks of sufficient complexity, the fact that we are very likely to find a local minimum through gradient descent is not that serious a problem from a practical standpoint. Here’s a thread which discusses that and points to the relevant paper from Yann LeCun’s group.

2 Likes