Gradient descent

Good day everyone, Please does a flunctuating cost value for a neural net signify a problem with the model??

I was working on this image classification problem, and I found out that the cost kept on flunctuating … say

Cost after 0 iteration 0.67854
Cost after 100 iteration 0.598765
Cost after 200 iteration 0.456535
Cost after 300 iteration 0.654738
Cost after 400 iteration 0.453278
Cost after 500 iteration 0.346865
Cost after 600 iteration 0.4768976
Cost after 700 iteration 0.3657832

Although I was able to get values of hyperparameters that gave a good test accuracy, after series of trial, and the train accuracy at all the trials were good enough to confirm learning… the flunctuating cost was kinda troubling to me. It happened on almost all the trials.

Please does this signify a problem with the model? and is there a way to fix this problem.

I had this initial intuition that gradient descent will keep on decreasing the cost till it converges at a local minima.

1 Like

If you reported more frequently than increments of 100 you would likely see even more fluctuation. It is not necessary that the cost be monotonically decreasing. Quite normal in fact that sometimes gradient descent overshoots a little. Since you seem comfortable running and comparing experiments, maybe try with smaller learning rate. Or better yet, do some experiments with dynamic learning rates. Let us know what you find?

Ps: there is some food for thought at this non-Discourse URL:

1 Like

Thank you very much for this @ai_curious

I am new to the deep learning world so I try to do more of experimental learning… I try to watch the impact of different changes on the model…

I appreciate the piece you directed me to… I am really getting understanding from it.

I have tried even smaller learning rates… although the learning process becomes very slow… the flunctuations still persist… that was why it began troubling me.

If the cost is for the validation set, then it is completely normal. If the cost is for the training set then it depends, but most of the time it still is.
If your dataset is larger than 100*batch_size, then it is normal to have fluctuations because the cost is computed over different data. If you are using all the available data, it may still occur because of the batch size, but if it happens in a few epochs, then you should probably reduce the learning rate. Anyhow, if the numbers are decreasing as you are showing I don’t see much of a problem.

Respect of your comment: “I had this initial intuition that gradient descent will keep on decreasing the cost till it converges at a local minima.”
Well, this is true if the gradients are computed with all the training data and not by batches.

Hope it helps

Yes … it did surely!.. thank you very much @isaac.casm