Hey!
I have manipulated a NN using Tensorflow framework on famous dataset of fashion_mnist, I use RMSProp as optimizer and i also tried adam optimizer but every time the cross validation error keeps on oscillating and on an average it gets increased,
Any idea where i might be wrong or what improvement can i make
It would shed more light to also know how your training error is behaving at the time that your validation error is oscillating or increasing. But it sounds like this may be a form of overfitting, so regularization might be one other thing to try.
Hey!
I am curios to know but how do you figure out that regularization solve this issue, as per my knowledge we use regularization when our model overfits the data that results to poor generalization…
Hi,
I think, looking at your graph, we can’t say overfitting yet.
I advise you to increase batch size first.
Thank you.
I have not tried to train a model against that particular dataset, but take a look at how the accuracy evolves after 8 Epochs (or 8 thousand or what ever that number means on your graphs): the training accuracy is much higher the validation accuracy. So that’s the definition of overfitting, isn’t it? Now there are two interesting follow up questions:
- The question you asked above: does the strange divergence of the validation accuracy between 3 and 5 and between 8 and 10 mean something pathological is going on?
- What do we do about the overfitting?
Maybe an easier question is what happens if you continue for a few more epochs? Maybe you get another spell where the training and validation accuracy converge as they did between 5 and 7 epochs.
This is an experimental science. I don’t claim to know the answer to question 1 as a general matter. Do you get similar behavior with both RMSprop and Adam? Have you tried any other model architectures?
I would not worry too much about the dips in validation accuracy, given the vertical axis scaling. A couple of percent variation is not likely significant.
But it’s difficult to say without knowing the size of the data sets.
Looking at those curves and if this happens only at a few particular epochs and then stabilizes I would agree with Tom above that its not a major issue. Why, because the validation incudes data the model has not seen and when a lot of those data are concentrated in a particular pass then its natural to have a drop in performance for that pass. Of course for a model to be robust needs regularization as well as Paul suggests.