How could I get 80% in training and validation in the Cats vs Dogs with Data Augmentation assignment?
Here are a few hints:
- You should try different model architectures starting with a much smaller model.
- The kernel size of 10 is too high for a conv filter. A much more reasonable size for a kernel is 2 or 3.
- Increase the number of filters per conv layer gradually with depth.
- The number of units in a dense layer / the number of nodes in a dense layer are usually powers of 2 (a heurestic that can be observed in many models).
- Choice of optimizer is also important. In your case, the learning rate of 1e-4 is a bit too small. The network has to be trained a lot longer to achieve good performance. I recommend trying out optimizers with default learning rates (try
adam
).
[code removed - moderator]
Would this be better?
Also for some reason, my val_accuracy stays at 0.5 and doesnt fluctuate at all:
Epoch 1/30
1407/1407 [==============================] - 177s 125ms/step - loss: 0.6982 - accuracy: 0.4949 - val_loss: 0.6944 - val_accuracy: 0.5000
Epoch 2/30
1407/1407 [==============================] - 175s 125ms/step - loss: 0.6976 - accuracy: 0.4956 - val_loss: 0.6932 - val_accuracy: 0.5000
Epoch 3/30
1407/1407 [==============================] - 173s 123ms/step - loss: 0.6979 - accuracy: 0.4968 - val_loss: 0.6975 - val_accuracy: 0.5000
Epoch 4/30
1407/1407 [==============================] - 173s 123ms/step - loss: 0.6971 - accuracy: 0.5065 - val_loss: 0.6946 - val_accuracy: 0.5000
Epoch 5/30
1407/1407 [==============================] - 176s 125ms/step - loss: 0.6972 - accuracy: 0.4989 - val_loss: 0.6957 - val_accuracy: 0.5000
Epoch 6/30
1407/1407 [==============================] - 177s 126ms/step - loss: 0.6970 - accuracy: 0.4990 - val_loss: 0.6942 - val_accuracy: 0.5000
Epoch 7/30
1407/1407 [==============================] - 176s 125ms/step - loss: 0.6960 - accuracy: 0.4969 - val_loss: 0.6962 - val_accuracy: 0.5000
Epoch 8/30
1407/1407 [==============================] - 176s 125ms/step - loss: 0.6960 - accuracy: 0.5028 - val_loss: 0.6935 - val_accuracy: 0.5000
Epoch 9/30
1407/1407 [==============================] - 176s 125ms/step - loss: 0.6965 - accuracy: 0.4989 - val_loss: 0.6999 - val_accuracy: 0.5000
Epoch 10/30
1407/1407 [==============================] - 176s 125ms/step - loss: 0.6967 - accuracy: 0.4939 - val_loss: 0.6938 - val_accuracy: 0.5000
Please click my name and message your notebook as an attachment.
The assignment asks you to use atleast 3 conv layers. This means, you can use more to meet the grader criteria. How about you go upto 5?
Also, leave the batch size at 32 (default) unless you are going to spend time tuning it.
Still no luck…
Add a MaxPooling2d
layer after every conv layer.
So that helped with the training accuracy but the validation accuracy is still stuck at 5.000 exactly, il send the new notebook.
Please fix the split_data
function. It’s buggy.
See this thread for a hint on when to compute the split size.
When training accuracy is way higher than validation accuracy, there’s overfitting in play. Here are a few ways to tackle this:
- The transformations you want to perform on the training data. When validation accuracy is way below training accuracy, it helps to expose the NN to a wider distribution of the training data.
- Adding
tf.keras.layers.Dropout
- Tuning the learning rate (hint: reduce it).
- Tuning the batch size
- Changing NN architecture.
A good amount of these are covered in deep learning specialization courses 2 and 3.
Some tips and common mistakes:
- Don’t forget to define an input shape in first layer
- Remember that convolutional filters should have an odd-sized kernel so that there is a center pixel
- Pooling filters are typically even sized kernels
- Don’t forget to give your DNN layers an activation! I forgot this and the default is linear (or none)
- I’ve noticed 3x3 conv filter size across many architectures.
- As far as pooling size is concerned, it’s better to leave it at the default from tensorflow (which is set to 2x2), unless you have a strong reason to customize.
Thank you Balaji. this post helped me clear the accuracy model. I think I understood few things with data augmentation, the more number of layers gives better training and testing accuracy. also the optimiser differs with or without data augmentation. So in data augmentation, Adam optimiser is better option.
Thank you
DP
The trick for me was to set the learning rate to 0.001 rather than 1e-4 and use batch sizes of 32 rather than larger. Oddly enough more convolution layers and more dense layer neurons did not seem to help. I used 3 convolution layers, similarly to previous assignments.
Could I know the reason why we should “Increase the number of filters per conv layer gradually with depth.”?
Why can’t I just keep the same number of filters or reduce some?
btw, is it necessary to increase by double?
The hyperparameter choice of number of filters as we go deeper in the network is picked based on other well known NNs like VGG16.
While there is no forumla for the exact number of filters you should place at a layer, the common practice is to increase the number of filters by a factor of 2 with each additional conv layer.
I planet use KerasTuner. That you think?
I mean use AutoML by KerasTuner search parameters. like size of layer and leaning rate
I don’t follow you. Please rephrase your reply.
[snippet removed by mentor]
After search parameters of CNN i use it and get 82% acc