Cost function shape in neural network

Juan_Olano · November 15, 2022, 4:59pm

Let me attempt to answer this quoted question, from the more general concept of Gradient Descent:

The goal of the models is to optimize it to predict very close to ground truth. For this optimization we use gradient descent, which assumes that the model can converge.

In simple systems where you have very few dimensions, you would use full gradient descent, a simpler formula to reach optimization. In these simple models with 1-2 dimensions, you can get to local minima that are traps. It may happen.

Then we have the more complex models which involve perhaps millions of parameters. For this we would use Stochastic Gradient Descent (SGD).

Stochastic Gradient (SGD), which is mainly used in complex NN, is unlikely going to get stocked in local minima because by nature it is very noisy. This noisiness may allow it sometimes to skip local minima. So you would say “hmm it is a matter of luck?”

Well, the real reaons why NN can be optimized is that there aren’t that many local minima that are ‘traps’. The complex NN are built in such a way that the parameter space has such high dimensions that there are hardly any local minima.

When we humans imagine the functions in a graph, we usually think in 2D or may be 3D, and we can arguably say that there are high chances of local minima traps, as discussed in simpler models that use Full Gradient Descent.

In 3D, however, we may start gaining intuition that trap-local-minima are rare. You’ll usually find the form of a saddle, where the apparent local minima can actually continue descending by one of the sides. From this intuition, try to imagine a complex neural network. These NN create such complex multi dimensional spaces (perhaps millions of dimensions or more that we cannot visualize), where local minima traps are rare.

This is my understanding of this situation

Topic		Replies	Views
Cost function - How can we make sure that we end up in the global minimum and not one of the local minima Supervised ML: Regression and Classification week-2	2	829	December 3, 2022
Optional Lab: Model Evaluation and Selection - Neural network Advanced Learning Algorithms week-3	2	306	February 13, 2024
Gradient descent C1_W1 Supervised ML: Regression and Classification week-1	2	514	July 31, 2022
Cost function stuck at local minima Neural Networks and Deep Learning coursera-platform	8	1445	July 5, 2024
Question regarding squared error cost and shape of J(w) Supervised ML: Regression and Classification week-1	1	398	July 2, 2023

Cost function shape in neural network

Related topics