Regularization Dropout Programming Assignment: How to intepretate when test accuracy is higher than training accuracy


I just done the regularization programming assignment. One interesting thing is that when using dropout, the training accuracy is ~92% while test accuracy is ~95%. My naive gut intuition is that test accuracy is theoretically <= train accuracy. If test accuracy > train accuracy, it might be just because a lucky split of train vs test data set where train has more noise. Is the above intuition correct? How do I intepretate when test > train accuracy?

Welcome any thoughts and discussion!

Yeah I think your intuition is correct. Its just the test split which has a lot of familiar and easy data that was seen in the training phase.


If the train and test data come from the same distribution, this behavior would get increasingly unlikely, the bigger both data sets are. Or the other way, the smaller a data set the more it might regress from the mean, potentially leading to test accuracy being larger than train accuracy. You can systematically reshuffle between train and test data and see if the result persists.

However, a different thought would be, but that is only my guess, that strictly speaking, because of dropout, the (averaged) NN used in training is smaller than the one used in testing, therefore it might not be THAT surprising that the larger network performs better as it uses many more units than the one used for training. Despite training found the corrects weights for all of the units, on average, the NN with less units used in dropout performs worse (even in training) than the one used in training.

If at all, worse performance in training might be an indicator that the data are not being overfitted and are therefore indicative of the quality of the NN. Since size reduction by dropout and overfitting will have opposite effects on training performance, no?