Loss is not stable when training neural network

In the practice lab of week 3, when training the complex neural network, i noticed that the loss of each epoch sometime increases. I have used Adam optimizer with learning rate of 0.01 like the hint suggested and I think it should decrease each epoch. Can someone explain me why did the loss increase, please?

image

Hi @fantastichaha11

It’s normal for the loss to occasionally increase during training, even with the Adam optimizer. These fluctuations are part of the optimization process, and as long as the overall trend is downward, the model is still learning effectively.

As long as remember, it is ok for your loss to increase in few epochs, but the generally it must decrease! It depends on your model architecture and hyper-parameter values.

Hope it helps! Feel free to ask if you need further assitance.

3 Likes

@Alireza_Saei Thanks for your reply! I’m kind of slightly understand. By the way, it’s seem like Dr Andrew dont mention this “fluctuations” in the specialization. Can you suggest me some documents or other courses about this?

1 Like

In a certain epoch the model might see data that has not previously seen because of data shuffling and its splitting, so the loss at this epoch might jump up, but in the next epoch if it comes across that data again it won’t jump up anymore.

1 Like

@gent.spah I’m afraid I can’t agree with you. I thought in an epoch, the model will pass through all the dataset, so it can’t “see data that has not previously seen”.

2 Likes

@fantastichaha11 remember, regarding SGD say, our solution spaces are potentially hyperplanes and massively complex.

We can’t just immediately ‘see’ the minimum-- If we could we wouldn’t even have to iterate at all: We’d just head right for the goal.

And though we have a strict cost function and its derivative to ‘guide us’, we still have to take a bit of a ‘blind leap’ each time, and there is no guarantee (can also depend on the actual distribution of the data we are looking at), that step is at least always better–

Though most of the time it is.

2 Likes

@Nevermnd has a good point here!

hi @fantastichaha11

After epoch as you pointed the model trains through each data point, the model contains hyperparameters as pointed by @Alireza_Saei for optimization of a model, so whenever one cycle epoch training is completed, these optimization like batch normalization, random flip is considered during the next cycle of epoch training and the model trains again through each layer and input one creates, it doesn’t start from previous epoch loss result data point but learns about it and retains from beginning where in the next cycle if the hyperparameters used in the new epoch cycle didn’t learn anything new, the loss ends up on a bit higher loss than the previous one which is completely normal until the next epoch training should have decrease trend. In case it doesnt then one has to suspect overfitting in the model depending on the parameters or optimization method used.

Same logic fits for the accuracy where there should be increase in accuracy and decrease in loss. Remember sometime a decrease in loss still comes with decrease in accuracy which is again not right for the model.

the accuracy curve records how accurate the model’s predictions are on the given data, while the loss curve records the actual difference between the model’s prediction and the actual true output.

This part of model training is actually very well explained in Deep Learning Specialisation.

Regards
DP

2 Likes