Hello, I have been trying for weeks now to create a model that will achieve 80% accuracy for both the training and validation. I feel at this point I have tried every possible combination of the following:
using different numbers of convolutional layers (between 3 and 5)
using different optimizers (adam, and RMSprop with learning_rate = 0.001)
using different values for number of filters in the convolutional layers (starting at 16 and usually going up by powers of two, all the way up to 256 filters if i am using 5 total layers)
I am obeying the observed conventions of using a 3X3 filter size, using a 2X2 Maxpooling size, etc. I really am at my whits end here. In fact, I keep running into usage limits on Colab because of all of the different model architectures I have been trying. I have even tried using dropout (even though i know this is out of scope for this assignment) and still have not been able to find a suitable model architecture.
Thank you for getting back to me. I just messaged you. How do I download it on my local machine? Are you suggesting I use something like VSCode to do it on a local jupyter notebook?
Batch size and learning rate are related. The default learning rates are set to work with the batch size of 32. Should you choose to use larger batch sizes like 64 or 128, the learning rate should be adjusted accordingly. I suggest you go through materials in deep learning specialization courses 2 and 3. Changing the batch size to 32 improved your model performance to reach more than 80% training accuracy with low validation accuracy.
Consider using layers like Dropout to prevent overfitting.
Hope this serves as a good starting point.
As far as downloading the notebook is concerned, you can use the file menu from the UI to download the notebook.
When describing the batch size, are you referring to the train_generator batch size or validation_generator batch size? And wait, you said that when you re-ran it, the validation accuracy is low? Isn’t that a problem because the validation accuracy also must be over 80%, or am I misunderstanding the assignment?
I have used dropout in different configurations of my model, but none have been able to achieve a training and validation accuracy over 80%.
So after downloading the notebook, I would also need to install jupyter locally, correct? Or can I download and run the notebook without installing jupyter locally?
I’m referring to setting the train batch size to 32. Validation accuracy must be over 80% as well. I was highlighting the detail that there is a relationship between learning rate and batch size. If you don’t reach a training accuracy of 80%, odds are slim that you’ll reach validation accuracy of 80%.
Please revise the materials in deep learning specialization on tuning the learning rate. If you haven’t taken the courses there, I recommend you take them before moving forward.
Do install jupyter locally to run the notebook on your desktop. See https://jupyter.org/
I have not had any issue getting the training accuracy above 80%, the issue is getting the validation accuracy above 80%. So are you saying I should tune the learning rate for the Adam optimizer, or for RMSprop?
I just reran the model I sent you (after having adjusted the learning rate) and unfortunately, while the training accuracy was 0.8260, the validation accuracy was 0.5344. As I previously mentioned, I already tried using dropout (among other things) to try and improve the model performance. What else should I try in order to get the validation accuracy above 80%?
hmm alright, i will try a few different model architectures with a batch size of 32 for both training and validation. Also, it is common convention to start with 32 layers in the first convolutional layer, correct?
I completed the first course (Neural Networks and Deep Learning) as well as part of the second course (Improving Deep Neural Networks: Hyperparameter Tuning, Regularization, and Optimization). I skipped ahead to this course because I studied data science at UC Berkeley, and found the content of the second course in the deep learning specialization to be repetitive compared to what I learned at Cal. I have found this course (Convolutional Neural Networks in TensorFlow) to be a very nice application of the more theoretical perspective I learned in college.
I’m learning about learning rate decay in C2W2 of deep learning specialization, and I think using the decay may also help increasing the accuracy. Large values of learning rates near the minimum causes the cost function to circle around the minimum. The decay move the cost much closer to the minimum. (Check out the programming assignment of that week for more info.)