When training with only one hidden layer we got training accuracy of 0.99 and testing accuracy of around 0.72. So can I say that this model is overfitting? If so, then shouldn’t we be decreasing number of features instead of increasing? i.e , decreasing, instead of increasing the number of hidden layers?
P.s- when increasing the hidden layer the testing accuracy increases to 0.8 . Isn’t it contradictory to the belief that when a model is overfitting we should decrease, rather than increase number of parameters?
Hi @Anish_Sarkar1,
You are right that it looks overfitting. If we close our eyes and do not inspect the training process and the evaluation process, a logical conclusion is to reduce the complexity of the network. However, that inspection is what we are going to discuss here:
-
Note that there is this line in the notebook:
Knowing the current architecture is not doing its best because of too many iterations, if we trained this 2-layer model with the right number of iterations, and a 4-layer model with another right number of iterations, how would the numbers change?
We need to hold on our doubt and ask these questions because your points have based on 0.72 and 0.8. If we could have picked other numbers of iterations that will change them, would our doubt still be valid?
-
Note that the testing set has only 50 samples. This is a very small dataset, meaning that one sample contributes to an accuracy score of 2%. You compared 0.72 to 0.8 which is just a difference of 4 samples. From the statistical point of view, concluding whether there is an improvement based on such a small difference (4 samples) is not very convincing.
@Anish_Sarkar1, reducing the complexity of the network can deal with an overfitting problem, however, the numbers 0.72 and 0.8 might not be reliable because of the unjustified number of iterations, and the sample size could just be too small for you to draw any convincing conclusion that the 4-layer model was making a significant improvement.
Cheers,
Raymond
1 Like
Got it, thanks for the clarification. So just disregard my comment about the test set accuracy going up from 0.72 to 0.8 , then would my first point still be valid? i.e, the model is overfitting? If yes then shouldn’t we be reducing the number of hidden layers i.e, the model complexity?
Or as the test accuracy still went up (however small) we can’t draw the conclusion that the model is overfitting?
Again, thank you very much for the detailed explanation.
Then now you are comparing between 0.72 and 0.99. We still have the same doubt to how reliable these numbers are. Again, if we close our eyes and take these numbers for granted, I am going to agree with you that it looks overfitting, and a logical step is to reduce the complexity of the network. However, this is not a good practice to ignore the reliability of the numbers we base our decision on.
Usually these reliability issues will be gone when the dataset is large enough, and if you see these numbers again with a large dataset, then it would be a good thing to try to reduce the complexity (e.g. number of layers) of the model.
Raymond
1 Like
Okay got it . Thanks for solving my doubt, it’s been a huge help.
Raymond’s covered everything here, but here’s another thread that shows some experiments with rebalancing the training and test data in this exercise. The overall point is that the dataset they use here is tiny compared to what you really need to get meaningful performance on an image recognition task as complicated as this. You can actually flip the question around and ask if they had to do anything clever to get 99% accuracy with a training set with only 209 examples. The answer from those other experiments seems to be that, yes, they had to adjust things very carefully,
1 Like