I asked ChatGPT about the look of the "weight landscape" and it gave good pointers

paulinpaloalto · March 3, 2025, 2:44pm

Well, the conditions under which this is true are very limited: only logistic regression, which Prof Ng points out can be considered to be a trivial Neural Network with just the output layer. Once you go to a neural network with more than one layer, convexity is history. As the thread David linked shows, the number of local minima is huge.

There is also some mathematics which shows that for networks of sufficient complexity, the fact that we are very likely to find a local minimum through gradient descent is not that serious a problem from a practical standpoint. Here’s a thread which discusses that and points to the relevant paper from Yann LeCun’s group.

Topic		Replies	Views
Understanding of local optima in deep networks Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	600	April 28, 2023
Local Optima with Gradient Descent Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	553	May 30, 2021
Confused on Saddle Points Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	519	August 19, 2023
Gradient descent C1_W1 Supervised ML: Regression and Classification week-1	2	514	July 31, 2022
Gradient Descent Doubt AI Discussions	7	115	July 11, 2022

I asked ChatGPT about the look of the "weight landscape" and it gave good pointers

Related topics