When to stop neural network?

donnie1123 · January 8, 2022, 8:01am

In this picture. It says in reality the training error will go up as the depth increases. I wonder when the training process will stop (i.e. stop when the update in parameter is small) And how can we ensure that it is not a local minimum for the loss function (Since convolution layer does not seem like a convex function) Thanks

paulinpaloalto · May 14, 2022, 8:02pm

None of the networks here are convex. You need to use the cost curves like the ones Prof Ng is explaining here to judge when you are no longer making progress or are perhaps diverging rather than converging. With the complex solution surfaces here, there is never any guarantee of smooth monotonic convergence: you may need to adjust the various hyperparameters like learning rate, number of iterations or even to adjust the architecture of your network or apply regularization in order to get things to work.

Also note that there is never any guarantee that a given solution is not a local minimum either, but it turns out that is not a big problem in general. Prof Ng makes this comment in a couple of places in the lectures, but doesn’t go into the details. It turns out that the math is pretty deep here, but here’s a thread that points to a well known paper from Yann LeCun’s group on this question of whether there are good (achievable) solutions for this kind of optimization problem.

donnie1123 · May 17, 2022, 9:23am

Got it. Thanks very much for your reply

donnie1123 · May 17, 2022, 9:32am

One more question. NN(neural network) of more layers definitely contains the NN of fewer layers. So more layers will definitely give us less training loss. For example, setting extra layers to be identity function without bias, this will just becomes the NN with fewer layers. So why the training loss will raises as number of layers increase? Just because harder to train?

donnie1123 · May 17, 2022, 10:28am

I mean if we train more steps for the cost function to be stable. will more layers still gives us better training loss thanks

paulinpaloalto · May 17, 2022, 2:38pm

Yes, a larger network (more layers and/or more neurons per layer) can represent a more complex function, so you would expect to eventually be able to get to a lower error and better accuracy. But it is more expensive to train (requiring more iterations and larger compute costs per iteration), because you have more parameters that need to be learned. The other important point that Prof Ng makes in the lectures is that in addition to the added training cost, a more complex network may also just give you overfitting on the training data. Meaning that there is such a thing as “overkill” and it is a balancing act and there is no “cut and dried” magic answer: you have to try some experimentation to figure out how big a network you need. Here in Course 1 there isn’t time to cover everything, but how to choose the size of your networks and how to tune other hyperparameters will be a major topic of DLS Course 2 and Course 3, so please stay tuned to learn more about all this.

Topic		Replies	Views
Cost function stuck at local minima Neural Networks and Deep Learning coursera-platform	8	1484	July 5, 2024
About local minimum in NN Advanced Learning Algorithms	6	75	July 7, 2024
Uniqueness of solutions in shallow 2-layer NN Neural Networks and Deep Learning coursera-platform	2	549	July 11, 2021
Cost fluctuation in Assignment 2 Neural Networks and Deep Learning coursera-platform	2	547	July 21, 2021
Why is ResNet needed? Convolutional Neural Networks coursera-platform	1	511	April 26, 2022

When to stop neural network?

Related topics