Hello, I’m struggling to achieve 95% accuracy for the training accuracy, the validation accuracy is always above 80% and seems to increase 1:1 with the training acc. I tried all possible architectures : with 3 Conv layers, 5 Conv layers, more Dense layers or a single Dense layer.
I’m using Adam optimizer and batch size =32 (also tried 64) for both training & validation generator. I’ve also used augmentation params to improve the accuracy but I’m stuck at a max. of 90% for training accuracy.
Could someone take a look at my notebook please ? Thanks.
Batch size of 32 is a good setting if you don’t know how to tune the learning rate based on batch size. See this paper for more details.
As far as the conv layers are concerned, if you are:
Following every conv2d layer with a maxpool2d layer
Increasing the number of conv2d filters as you go deeper in the network (as powers of 2 with 8 being the fewest number of filters in the network).
Please click my name and message your notebook as an attachment.
There’s no need to shuffle or augment the validation set. This should be okay since we want to measurement to be the same across different training epochs.
Regarding your architecture search, start with say, 32 and then gradually increase the number of filters (in your writeup, you’ve used 18. While there is no formula to pick the number of filters, powers of 2 that are >= 8 tend to usually work well). As far as the number of Dense layers are concnerned, you can have more than 1 Dense layer as well.
If you made the model overly complex reduce the learning rate to 0.0001 (Adam in my case). The gradients are very sensible if you stacked up a lot of convolution and dense layers. This way I achieved 99.3% of training accuracy and 85% on validation. It is kind of forcing to overfit extremely…
Your learning rate of 1e-4 looks small for such a batch size. The default learning rate for adam is 1e-3. This setting goes well with a batch size of ~32. A bigger batch size can get better results from a higher learning rate. You might want try learning rates like 2e-3, 3e-3 etc.