If the hints below don’t help, please click my name and message your notebook as an attachment:
The assignment note hints at using more than 2 Conv2D and 2 MaxPooling2D layers to achieve the desired performance. This suggestion doesn’t include other layers like Dense.
hey I referred the same post to clear assignment in the week 3, but I have doubt for this section. other than 2 conv2d and 2 maxpooling2d, I have used 2 dense layer. I didn’t include dropout. So in my last dense layer I used 26 units that’s where in one of the post you replied not to use, so I tried using 3, that when I get value error. I also did optimiser Adam first which still gave the same result. then I used rms-prop, which reduced the accuracy.
this post is related week 4. what thread you told me to refer just now, I just mentioned that it helped to achieve the expected accuracy in week 3 assignment. But in week 4, when I am applying same variables as you mentioned I am not getting the expected accuracy. Hope you understood now.
the link you sent for reference shows rescaling to be added to the model layer but in course rescaling is only added in train_datagen. These links are confusing me more. Please clarify this doubt for me
[[THE BELOW CODE IS NOT PART OF ANY OF THE ASSIGNMENT DONE. IT WAS ONLY POSTED FOR CLARIFYING DOUBT PURPOSE FROM THE TENSORFLOW LIBRARY]
As far as augmentation is concerned, apply them after experimentation on the impact they make on the training / validation dataset.
Here’s what it means:
When training dataset accuracy is low, the model is underfitting the dataset (i.e. high bias). Model needs to improve to better fit the dataset.
When training dataset accuracy is high and validation dataset accuracy is low, model is suffering from high variance. Model needs to better cope up with the distribution of the validation dataset. Data augmentation applied training set helps cover the distribution of the validation dataset to better cope with this situation.
Rescaling can be applied either as part of the dataset generator or by including rescaling layer within the model. Both approaches produce the same results. The reason for asking you to study the links was to highlight the number of conv filters increasing as you go deeper in the network.
Deep learning specialization does go into model diagnosis in detail. This is a good time to revise / study the material to better tackle this situation.
Did you go through the ungraded labs in the first 2 courses of this specialization that perform image classification?
As far as the links I gave you, do look at the model architecture in terms of activation functions, filter sizes, number of dense layers and the number of units of each dense layer other than the final dense layer.
I will be honest Balaji I used to not check much of the ungraded labs, but for TensorFlow I did see.
I have followed whatever instructions told. [snippet removed - mentor]
What difference I noticed from the ungraded lab and the assignment I am doing is the batch size differs. can that have an affect.
Also in the assignment it is clearly mentioned to not to use not more than 2 conv2D and 2 MaxPooling2D and I followed the same.
even the image augmentation I have used a simple rescale and other parameters for training images. Do I need to make changes in this for the training to have more better accuracy?? like rotation range I have put 40, other shear, zoom range 0.2
I really don’t want to give me directly the correct coding, but clear my doubt if I am looking at the right spot to correct my model.
No worries. Tuning hyperparameters manually is a time consuming task. Assuming you read about underfitting and overfitting, go ahead and try different model configurations and figure out a model / augmentation techniques that work well for the dataset. I recommend you have a spreadsheet to note down model performance for each configuration for 15 epochs.
An optimizer updates model weights for every batch. So, batch size does have an effect on training.
32 is a commonly used value for batch size. A popular heuristic is to try batch sizes that are powers of 2.
ok I finally achieved 99% accuracy with the help of mentor @balaji.ambresh guidance and this post. I just wanted to point what I understood through week 4 assignment, hoping it will help someone in future just the way this attached post link from @shiro help me.
When training model between two classes, image augmentation can vary from rotation range to horizontal flip based on the image one is training. So that explains keep the training_datagen as simple as possible to make it less complicated when using more than two classes.
I also noticed the importance of batch_size, as the batch size increases, the neural network layer should be lesser.
Also emphasising on the data augmentation for this exercise is must to understand. I removed these three hyper parameters: rotation_range, horizontal_flip, shear_range to achieve accuracy 99% explaining that training dotage to be as simple as possible when model training for multi-class.
Great suggestions here thanks to everyone for the helpful comments. I had mostly been recycling the DataImageGenerator hyper-parameters from the ungraded labs going through these courses, but I think with sign language there’s good reason to limit the image augmentation. Sign language symbols don’t necessarily have the symmetries that other images (like a cat pic or something).