I noticed that in the horses versus humans notebook, C1_W4_Lab_1_image_generator_no_validation.ipynb, the accuracy increases when convolutional layers are removed.

The original accuracy is around ~92% (for me), but when the 5th convolutional layer is removed (incl. pooling) it reaches 100% in the last epoch. Moreover, if not only the 5th but also the 4th layer is removed, the accuracy reaches 100% consistently during the last six epochs.

Are these results correct / expected? It sounds counterintuitive.

For each test I re-ran the cells that defined the model, compiled the model and fit the model. I did not re-run the data preprocessing cell but there should be no need for that.

Edit: the number of trainable parameters goes up each time a layer is removed. Perhaps that explains it. But that leads to a question about how many layers you should use …

Edit 2: After going through the remainder of the training for this week I may be able to answer my own question. The number of layers should be such that accuracy on the training and validation sets is high and the model does not overfit.

Would you mind post a public Colab of your code, I guess it would be a copy of your work ?

Just leave two cents from my side, the accuracy is just one part of metrics to judge the performance of model, it can be used for underfitting judgment at some degrees. However I suggest to use loss (error) on training and cross validation set to figure out overfitting and get the performance of model clearly.

Why loss monitoring as a useful tool? Here is just my unofficial and intuitive example for memories:

You have logistic regression with sigmoid activation.
You define that the probability < 0.5 as negative: 0
You define that the probability >= 0.5 as positive: 1
For accuracy judgment you use 0 and 1 instead probability. They cannot be used to calculate the error against ground true value in dataset.

Let’s say you have one ground true label for training “1”.
A bad model produces probability 0.51 which means positive, but the error, the distance between 0.51 to 1 is so huge.
A good model produces probability 0.9999 which means positive, too, but the error, the distance between 0.9999 to 1 is so small.

I hope that you get feeling, the later gives much more confidence, right ?

Reflect to training:

At training the loss(error) on training set is always good, means small and there is also a higher accuracy ( model predicts positive and expected is positive and so on for negative).
Meanwhile on the cross validation set the loss(error) Bad, which is larger, but at some degrees the accuracy could be not so bad.

I’d say this is true of many of the tuneable parameters: training epochs, size of training data set, training/val split…you can keep all but one stable and watch the impact of changes on results. In the real world, you also often care about throughput…how long does it take to perform forward and backward propagation. This matters both because time and computation costs money but also you may have performance constraints. Maybe a deeper network does gain some accuracy but takes sufficiently long to make predictions that it isn’t useable.

Final thought re. training and validation accuracy: the approach would be to observe both at the same time rather than first trying to optimize the training accuracy, because that could in and of itself lead to low validation accuracy (overfitting).