In the practice lab of week 3, when training the complex neural network, i noticed that the loss of each epoch sometime increases. I have used Adam optimizer with learning rate of 0.01 like the hint suggested and I think it should decrease each epoch. Can someone explain me why did the loss increase, please?
Itâs normal for the loss to occasionally increase during training, even with the Adam optimizer. These fluctuations are part of the optimization process, and as long as the overall trend is downward, the model is still learning effectively.
As long as remember, it is ok for your loss to increase in few epochs, but the generally it must decrease! It depends on your model architecture and hyper-parameter values.
Hope it helps! Feel free to ask if you need further assitance.
@Alireza_Saei Thanks for your reply! Iâm kind of slightly understand. By the way, itâs seem like Dr Andrew dont mention this âfluctuationsâ in the specialization. Can you suggest me some documents or other courses about this?
In a certain epoch the model might see data that has not previously seen because of data shuffling and its splitting, so the loss at this epoch might jump up, but in the next epoch if it comes across that data again it wonât jump up anymore.
@gent.spah Iâm afraid I canât agree with you. I thought in an epoch, the model will pass through all the dataset, so it canât âsee data that has not previously seenâ.
@fantastichaha11 remember, regarding SGD say, our solution spaces are potentially hyperplanes and massively complex.
We canât just immediately âseeâ the minimum-- If we could we wouldnât even have to iterate at all: Weâd just head right for the goal.
And though we have a strict cost function and its derivative to âguide usâ, we still have to take a bit of a âblind leapâ each time, and there is no guarantee (can also depend on the actual distribution of the data we are looking at), that step is at least always betterâ
Though most of the time it is.
@Nevermnd has a good point here!
After epoch as you pointed the model trains through each data point, the model contains hyperparameters as pointed by @Alireza_Saei for optimization of a model, so whenever one cycle epoch training is completed, these optimization like batch normalization, random flip is considered during the next cycle of epoch training and the model trains again through each layer and input one creates, it doesnât start from previous epoch loss result data point but learns about it and retains from beginning where in the next cycle if the hyperparameters used in the new epoch cycle didnât learn anything new, the loss ends up on a bit higher loss than the previous one which is completely normal until the next epoch training should have decrease trend. In case it doesnt then one has to suspect overfitting in the model depending on the parameters or optimization method used.
Same logic fits for the accuracy where there should be increase in accuracy and decrease in loss. Remember sometime a decrease in loss still comes with decrease in accuracy which is again not right for the model.
the accuracy curve records how accurate the modelâs predictions are on the given data, while the loss curve records the actual difference between the modelâs prediction and the actual true output.
This part of model training is actually very well explained in Deep Learning Specialisation.
Regards
DP