Week 3 Assignment - Overfitting without test data

Shuyuan_Wang · November 11, 2021, 11:21pm

Hi,

In the last part of the assignment, we can see different training accuracy when we try different number of hidden nodes. And it shows a sign of overfitting as the number of hidden layer nodes increases.

I understand there will be overfitting if hidden layer nodes are too many. However, why do we observe overfitting without test data? Normally, the overfitting happens when we have nearly perfect prediction on training data, but worse prediction on testing data. As the hidden layer nodes increase, why do we have worse training prediction accuracy?

Hope someone can correct my logic!

Many thanks.

paulinpaloalto · November 12, 2021, 6:45am

This assignment is kind of a special case in that there is no “test” data, as you observe. So the definition of “overfitting” does not even apply: we want as perfect a fit as we can get. My guess is that the problem with very large numbers of hidden layer nodes is that the training becomes very expensive and perhaps the convergence is also more difficult, meaning that you need a more sophisticated strategy with dynamically managed learning rates or the like. It’s not a given that convergence will work the same with different numbers of neurons: you may have to tweak other hyperparameters like number of iterations and learning rate.

I just checked the results that I got in that section of the notebook and here they are:

Accuracy for 1 hidden units: 67.5 %
Accuracy for 2 hidden units: 67.25 %
Accuracy for 3 hidden units: 90.75 %
Accuracy for 4 hidden units: 90.5 %
Accuracy for 5 hidden units: 91.25 %
Accuracy for 20 hidden units: 90.75 %
Accuracy for 50 hidden units: 90.25 %

I would say that the difference between 90.25 and 91.25% accuracy is basically in the noise, but you could try some experiments with running more iterations in the n = 20 and n = 50 cases and see if the accuracy improves.

Shuyuan_Wang · November 12, 2021, 4:13pm

Thanks Paul! Your explanation makes sense to me.
But it would be better if Week 3 Assignment can be modified with regard to this section, because the “overfitting” saying is kind of confused? (even if this part is optional)

paulinpaloalto · November 12, 2021, 5:22pm

That’s a good point. I had forgotten the comments that they make about overfitting. I think they are speaking “in general”, meaning that in the normal case where you are training a model that is intended to apply to multiple different inputs, a network with large numbers of nodes will tend to overfit, unless you apply regularization.

But the specific case here is not learning a “general” model, so that concept doesn’t really apply. I’ll file an issue with the course staff and hope that they can come up with some better wording in that section.

Thank you for pointing that out!

Topic		Replies	Views
W4_Overfitting of the Model vs Training Accuracy Neural Networks and Deep Learning coursera-platform	6	593	April 8, 2023
Overfitting in W4_A2_Ex2 Neural Networks and Deep Learning coursera-platform	4	544	October 15, 2021
Overfitting example not working as intended. Programming Assignment: Planar Data Classification with One Hidden Layer Neural Networks and Deep Learning week-3 , coursera-platform	3	28	October 1, 2024
Course 1, Week 2 Assignment Neural Networks and Deep Learning coursera-platform	6	719	November 3, 2021
Week3_Performance on other dataset Neural Networks and Deep Learning coursera-platform	3	588	August 18, 2021

Week 3 Assignment - Overfitting without test data

Related topics