How to reach 99% training accuracy with only 2 Conv2D layers?


Did anybody manage to get a 99% accuracy on the training set. I tired using two Conv2D and one Dense layers (up to 1024 neurons) but only managed to get around 90% on training and 96% on validation.

Any suggestions?

1 Like

Hi, @frzuritaa!

It seems that you could benefit from increasing the size and complexity of the model. Try with more filters on the convolutional (as you should not use more than two conv layers) or a couple of bigger dense layers on the output.

1 Like

Hi @frzuritaa .
After some hours of tuning I was able to reach following:

My advise is don’t make it too much complicated, cause we work here with images of shape 28x28, so many conv layers and many neurons in the fully connected layers won’t solve the problem within 15 epochs (it can cause some errors cause after some convolutions you can end with minus dimensions). So try to make it really simple. Using all the features of image pre-processing is also not good idea to achieve the target in less than or equal 15 epochs. So simplification is the way! Wish you good luck!

Hint: I don’t use more than 32 filters for conv layers and 512 is the maximum of neurons I use in Dense layers


hey I used the same 32 filters for conv layers and 512 maximum neuron units, but my training accuracy is still below 90%.

my batch size is 32, so do I need to increase maximum neuron units??

How many layers do you use in total?

I already solve the issue. Thank you for replying

I tried

model = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(32, (3,3), activation=‘relu’, input_shape=(28, 28, 1)),
tf.keras.layers.MaxPooling2D(2, 2),
# The second convolution
tf.keras.layers.Conv2D(32, (3,3), activation=‘relu’),
# Flatten the results to feed into a DNN
# 512 neuron hidden layer
tf.keras.layers.Dense(512, activation=‘relu’),
tf.keras.layers.Dense(25, activation=‘softmax’)

but got only to about 83% accuracy in the training set. I played with the dropout and that didn’t change the results much. I assume it would actually be safe to keep the dropout small because it seems that we are not overfitting from the learning curves. Moving from 32 to 64 filters improves the training accuracy by 5%. I have no idea how you could got 99% with 32 filters. Is it based only on the model architecture or also on the image augmentation?

Hi @Rok_Bohinc. This is the MNIST Sign Language Assignment, right?
I tried that after more than 1 year and got 99%+ after 8 epochs.

Please, read carefully following part of assignment:

Welcome to this assignment! In this exercise, you will get a chance to work on a multi-class classification problem. You will be using the Sign Language MNIST dataset, which contains 28x28 images of hands depicting the 26 letters of the english alphabet.

There is something VERY IMPORTANT and which can help you to understand, what’s wrong in your code. It’s really a minor issue causing the low performance. If this is still not working properly, let me know, I’ll try to guide you.

In the assignment there is also the following written:

Note: The documentation of the dataset mentions that there are actually no cases for the last letter, Z, and this will allow you to reduce the recommended number of output units above by one. If you’re not yet convinced, you can safely ignore this fact for now and study it later. You will pass the assignment even without this slight optimization.

That is why I put 25 and not 26 units in the last layer. Increasing it to 26 doesn’t solve the problem, it actually makes accuracy drop:

Epoch 15/15
858/858 [==============================] - 15s 17ms/step - loss: 0.5815 - accuracy: 0.8075 - val_loss: 0.2044 - val_accuracy: 0.9361

Hi @Rok_Bohinc.

Then send me your notebook via DM, please. I can check it by myself.


Hi @Rok_Bohinc.

Thanks for sharing your notebook, but you should do it always via DM (direct message), so please remove it from here. I quickly check that and it seems to be good on the first sight…

… but, if I check your outputs, it’s visible accuracy goes up not that quickly:

What I did:

  • checked your NN model - it’s ok and pretty reasonable (means it’s simple enough)
  • checked what you do w/ input images - there you make a lot of unnecessary augmentation for this exercise (in your case it’s too much to reach the goal within less than 15 epochs) → try to reduce it (less is more here).

I played a little bit w/ your notebook and here is the output:

So exercise target achieved within 5 epochs. In case you still have an issue, let me know. Wish you best of luck :+1:! You are VERY VERY CLOSE to finish it!


@Rok_Bohinc: To be more specific w/ DM (Direct message), here’s how to do it:

You just click any person name/icon/badge and then you’ll see “Message”. By clicking there you just share what you want in a private message.


Very interesting. I have also achieved a much better result tuning down the augmentation.

The question is how to recognize if you are doing too much/too little augmentation?

Is the taught process, that you try some level of augmentation and look at the learning curves? In my original case I was clearly not overfiting. So is the conclusion that I should tune down the augmentation? How do you determine the level of augmentation?

Thank you for your input. I’m learning Muscle|20pxx20px

Best regards,

@Rok_Bohinc it is just based on exercise target here:

Here it was to achieve 95%+ for validation (resp. 99%+ for training) within 15 epochs - all at once. So you needed here a kind of tuning to find the best combination and not to do it very complex.

Your approach is/was generally correct, if there is no limitation w/ epochs. If you run it for 50 epochs, you’d have probably better results (from your charts it’s visible that more epoch would help in your case).

It’s everyday learning also for me :slight_smile: .

If it really helped you, please mark it as a solution. Thank you very much!
I’m glad you made it till the end :+1: .