Training accuracy 86% validation 94%, required training accuracy 99% validation accuracy 95%

Hello, @balaji.ambresh

my model training is not surpassing the expected accuracy. can you guide me where I can improve my model.

in one of the related post, you mentioned not used 26 units, I have used the same, can that be the issue?

thank you
DP

If the hints below don’t help, please click my name and message your notebook as an attachment:

  1. The assignment note hints at using more than 2 Conv2D and 2 MaxPooling2D layers to achieve the desired performance. This suggestion doesn’t include other layers like Dense.
  2. See this and this.
1 Like

hey I referred the same post to clear assignment in the week 3, but I have doubt for this section. other than 2 conv2d and 2 maxpooling2d, I have used 2 dense layer. I didn’t include dropout. So in my last dense layer I used 26 units that’s where in one of the post you replied not to use, so I tried using 3, that when I get value error. I also did optimiser Adam first which still gave the same result. then I used rms-prop, which reduced the accuracy.

1 Like

In this reply under course 2 week 3, you mentioned a problem with course 2 week 4. That was the reason for creating this new topic.

I’m confused by rest of your message. Please help me understand what the issue is.

this post is related week 4. what thread you told me to refer just now, I just mentioned that it helped to achieve the expected accuracy in week 3 assignment. But in week 4, when I am applying same variables as you mentioned I am not getting the expected accuracy. Hope you understood now.

Got it. Please click my name and message your notebook as an attachment.

1 Like

Please check you inbox I have mailed you the notebook.

Please do the following:

  1. As mentioned in the previous reply, increase the number of conv filters as you move deeper in the network.
  2. Increase number of non-final layer dense layer units as powers of 2 starting with a value like 32.
  3. Keep the optimizer as adam.
  4. Refer to model architectures in ungraded labs for image classification.
  5. Read the following tensorflow docs: 1 and 2
1 Like

you mean to say my train_generator is incorrect?

the link you sent for reference shows rescaling to be added to the model layer but in course rescaling is only added in train_datagen. These links are confusing me more. Please clarify this doubt for me

[[THE BELOW CODE IS NOT PART OF ANY OF THE ASSIGNMENT DONE. IT WAS ONLY POSTED FOR CLARIFYING DOUBT PURPOSE FROM THE TENSORFLOW LIBRARY]

model = Sequential([
  layers.Rescaling(1./255, input_shape=(img_height, img_width, 3)),
  layers.Conv2D(16, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Conv2D(32, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Conv2D(64, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Flatten(),
  layers.Dense(128, activation='relu'),
  layers.Dense(num_classes)
])
1 Like

As far as augmentation is concerned, apply them after experimentation on the impact they make on the training / validation dataset.
Here’s what it means:

  1. When training dataset accuracy is low, the model is underfitting the dataset (i.e. high bias). Model needs to improve to better fit the dataset.
  2. When training dataset accuracy is high and validation dataset accuracy is low, model is suffering from high variance. Model needs to better cope up with the distribution of the validation dataset. Data augmentation applied training set helps cover the distribution of the validation dataset to better cope with this situation.

Rescaling can be applied either as part of the dataset generator or by including rescaling layer within the model. Both approaches produce the same results. The reason for asking you to study the links was to highlight the number of conv filters increasing as you go deeper in the network.

Deep learning specialization does go into model diagnosis in detail. This is a good time to revise / study the material to better tackle this situation.

1 Like

the accuracy went below 80% after adding more dense layers!!!

Did you go through the ungraded labs in the first 2 courses of this specialization that perform image classification?
As far as the links I gave you, do look at the model architecture in terms of activation functions, filter sizes, number of dense layers and the number of units of each dense layer other than the final dense layer.

See this link to learn about overfitting.

I will be honest Balaji I used to not check much of the ungraded labs, but for TensorFlow I did see.

I have followed whatever instructions told. [snippet removed - mentor]

What difference I noticed from the ungraded lab and the assignment I am doing is the batch size differs. can that have an affect.

Also in the assignment it is clearly mentioned to not to use not more than 2 conv2D and 2 MaxPooling2D and I followed the same.

even the image augmentation I have used a simple rescale and other parameters for training images. Do I need to make changes in this for the training to have more better accuracy?? like rotation range I have put 40, other shear, zoom range 0.2

I really don’t want to give me directly the correct coding, but clear my doubt if I am looking at the right spot to correct my model.

sorry for the trouble

No worries. Tuning hyperparameters manually is a time consuming task. Assuming you read about underfitting and overfitting, go ahead and try different model configurations and figure out a model / augmentation techniques that work well for the dataset. I recommend you have a spreadsheet to note down model performance for each configuration for 15 epochs.

An optimizer updates model weights for every batch. So, batch size does have an effect on training.
32 is a commonly used value for batch size. A popular heuristic is to try batch sizes that are powers of 2.

1 Like

ok I finally achieved 99% accuracy with the help of mentor @balaji.ambresh guidance and this post. I just wanted to point what I understood through week 4 assignment, hoping it will help someone in future just the way this attached post link from @shiro help me.

  1. When training model between two classes, image augmentation can vary from rotation range to horizontal flip based on the image one is training. So that explains keep the training_datagen as simple as possible to make it less complicated when using more than two classes.

  2. I also noticed the importance of batch_size, as the batch size increases, the neural network layer should be lesser.

  3. Also emphasising on the data augmentation for this exercise is must to understand. I removed these three hyper parameters: rotation_range, horizontal_flip, shear_range to achieve accuracy 99% explaining that training dotage to be as simple as possible when model training for multi-class.

Refer link is below

Thank you
DP

1 Like

Great suggestions here thanks to everyone for the helpful comments. I had mostly been recycling the DataImageGenerator hyper-parameters from the ungraded labs going through these courses, but I think with sign language there’s good reason to limit the image augmentation. Sign language symbols don’t necessarily have the symmetries that other images (like a cat pic or something).

Please don’t post code in public.

Here’s the community user guide.