Hello, I have been trying for weeks now to create a model that will achieve 80% accuracy for both the training and validation. I feel at this point I have tried every possible combination of the following:
- using different numbers of convolutional layers (between 3 and 5)
- using different optimizers (adam, and RMSprop with learning_rate = 0.001)
- using different values for number of filters in the convolutional layers (starting at 16 and usually going up by powers of two, all the way up to 256 filters if i am using 5 total layers)
I am obeying the observed conventions of using a 3X3 filter size, using a 2X2 Maxpooling size, etc. I really am at my whits end here. In fact, I keep running into usage limits on Colab because of all of the different model architectures I have been trying. I have even tried using dropout (even though i know this is out of scope for this assignment) and still have not been able to find a suitable model architecture.
What should I do?
You can download the notebook and solve the assignment on your local machine. There’s no need to get stuck because of colab usage limits.
If this doesn’t help, please click my name and message your notebook as an attachment with your best performing model code.
Thank you for getting back to me. I just messaged you. How do I download it on my local machine? Are you suggesting I use something like VSCode to do it on a local jupyter notebook?
- Batch size and learning rate are related. The default learning rates are set to work with the batch size of 32. Should you choose to use larger batch sizes like 64 or 128, the learning rate should be adjusted accordingly. I suggest you go through materials in deep learning specialization courses 2 and 3. Changing the batch size to 32 improved your model performance to reach more than 80% training accuracy with low validation accuracy.
- Consider using layers like
Dropout to prevent overfitting.
Hope this serves as a good starting point.
As far as downloading the notebook is concerned, you can use the file menu from the UI to download the notebook.
Consider google colab since they offer access to very nice GPUs at a good price.
Sorry I don’t know much about vscode. You can install jupyter locally though.
When describing the batch size, are you referring to the train_generator batch size or validation_generator batch size? And wait, you said that when you re-ran it, the validation accuracy is low? Isn’t that a problem because the validation accuracy also must be over 80%, or am I misunderstanding the assignment?
I have used dropout in different configurations of my model, but none have been able to achieve a training and validation accuracy over 80%.
So after downloading the notebook, I would also need to install jupyter locally, correct? Or can I download and run the notebook without installing jupyter locally?
Since Dropout is a way to perform regularization, using lots of Dropout will cause the training cost to increase.
I’m referring to setting the train batch size to 32. Validation accuracy must be over 80% as well. I was highlighting the detail that there is a relationship between learning rate and batch size. If you don’t reach a training accuracy of 80%, odds are slim that you’ll reach validation accuracy of 80%.
Please revise the materials in deep learning specialization on tuning the learning rate. If you haven’t taken the courses there, I recommend you take them before moving forward.
Do install jupyter locally to run the notebook on your desktop. See https://jupyter.org/
I have not had any issue getting the training accuracy above 80%, the issue is getting the validation accuracy above 80%. So are you saying I should tune the learning rate for the Adam optimizer, or for RMSprop?
I just reran the model I sent you (after having adjusted the learning rate) and unfortunately, while the training accuracy was 0.8260, the validation accuracy was 0.5344. As I previously mentioned, I already tried using dropout (among other things) to try and improve the model performance. What else should I try in order to get the validation accuracy above 80%?
before i reran the model i did set the batch size to be 32 for both train_generator and validation_generator
The batch size in the notebook you sent is 128 for training and 32 for validation set.
Do the following:
- Leave the optimizer as adam.
- Let 32 be batch size for training and validation datasets.
- Change model architecture to achieve the desired performance.
Have you taken deep learning specialization?
hmm alright, i will try a few different model architectures with a batch size of 32 for both training and validation. Also, it is common convention to start with 32 layers in the first convolutional layer, correct?
I completed the first course (Neural Networks and Deep Learning) as well as part of the second course (Improving Deep Neural Networks: Hyperparameter Tuning, Regularization, and Optimization). I skipped ahead to this course because I studied data science at UC Berkeley, and found the content of the second course in the deep learning specialization to be repetitive compared to what I learned at Cal. I have found this course (Convolutional Neural Networks in TensorFlow) to be a very nice application of the more theoretical perspective I learned in college.
Completing Courses 4 (Conv NN’s) and 5 (Sequence Models / RNNs) in the Deep Learning Specialization might have been useful.
Congrats on completing the data science program at UCBerkeley. I’ve heard really nice things about your school.
That said, I don’t know if you’ve covered all the content taught in deep learning specialization (DLS) @ Cal:
- For instance, 32 refers to number of filters and not layers. Conv2D is a layer made of a bunch of filters.
- The relationship between learning rate and batch size is covered in DLS
- Popular network architectures (to answer the question about 32 filters at 1st conv2d layer) as pointed by @TMosh is covered in course 4.
- Bias / variance tradeoff when it comes to improving validation set performance, keeping augmentation in mind.
Consider completing DLS to have a good understanding of these details.
I’m learning about learning rate decay in C2W2 of deep learning specialization, and I think using the decay may also help increasing the accuracy. Large values of learning rates near the minimum causes the cost function to circle around the minimum. The decay move the cost much closer to the minimum. (Check out the programming assignment of that week for more info.)