Gradient descent

orabueze_chijioke · December 4, 2021, 7:33am

Good day everyone, Please does a flunctuating cost value for a neural net signify a problem with the model??

I was working on this image classification problem, and I found out that the cost kept on flunctuating … say

Cost after 0 iteration 0.67854
Cost after 100 iteration 0.598765
Cost after 200 iteration 0.456535
Cost after 300 iteration 0.654738
Cost after 400 iteration 0.453278
Cost after 500 iteration 0.346865
Cost after 600 iteration 0.4768976
Cost after 700 iteration 0.3657832

…

Although I was able to get values of hyperparameters that gave a good test accuracy, after series of trial, and the train accuracy at all the trials were good enough to confirm learning… the flunctuating cost was kinda troubling to me. It happened on almost all the trials.

Please does this signify a problem with the model? and is there a way to fix this problem.

I had this initial intuition that gradient descent will keep on decreasing the cost till it converges at a local minima.

ai_curious · December 4, 2021, 1:21pm

If you reported more frequently than increments of 100 you would likely see even more fluctuation. It is not necessary that the cost be monotonically decreasing. Quite normal in fact that sometimes gradient descent overshoots a little. Since you seem comfortable running and comparing experiments, maybe try with smaller learning rate. Or better yet, do some experiments with dynamic learning rates. Let us know what you find?

Ps: there is some food for thought at this non-Discourse URL:

orabueze_chijioke · December 4, 2021, 2:55pm

Thank you very much for this @ai_curious …

I am new to the deep learning world so I try to do more of experimental learning… I try to watch the impact of different changes on the model…

I appreciate the piece you directed me to… I am really getting understanding from it.

I have tried even smaller learning rates… although the learning process becomes very slow… the flunctuations still persist… that was why it began troubling me.

isaac.casm · December 11, 2021, 9:00am

If the cost is for the validation set, then it is completely normal. If the cost is for the training set then it depends, but most of the time it still is.
If your dataset is larger than 100*batch_size, then it is normal to have fluctuations because the cost is computed over different data. If you are using all the available data, it may still occur because of the batch size, but if it happens in a few epochs, then you should probably reduce the learning rate. Anyhow, if the numbers are decreasing as you are showing I don’t see much of a problem.

Respect of your comment: “I had this initial intuition that gradient descent will keep on decreasing the cost till it converges at a local minima.”
Well, this is true if the gradients are computed with all the training data and not by batches.

Hope it helps

orabueze_chijioke · December 15, 2021, 8:29am

Yes … it did surely!.. thank you very much @isaac.casm

Topic		Replies	Views
Question regarding learning rate graph from W2 logistic regression lab Neural Networks and Deep Learning	3	652	July 28, 2023
Cost fluctuation in Assignment 2 Neural Networks and Deep Learning	2	545	July 21, 2021
Course 1, Week 4. Parameters and hyper-parameters Neural Networks and Deep Learning	12	596	June 9, 2021
General implementation of deep neural network for multi class classification problem, using course 1 and course 2 Improving Deep Neural Networks: Hyperparameter tun week-1	13	294	January 5, 2024
Learning rate. - course notes Neural Networks and Deep Learning week-2	10	319	February 2, 2024

Gradient descent

Related topics