W3_A1_Ex-6_Layer size for hidden layers

Hi All,

When running “6 - Tuning hidden layer size (optional/ungraded exercise)” in week 3 programming assignment, I get the same accuracy (90.5 %) and plots for different hidden layer units.

In the description, its says “The best hidden layer size seems to be around n_h = 5”. Given accuracy and plots are same, why the best hidden layer size is 5?

1 Like

The goal is to achieve the best result (or at least a “good enough” result) with the minimal cost in terms of compute for training and compute at prediction time. If the larger networks with for example n_h = 20 give the same accuracy as n_h = 5, then using the larger network is extra cost without any benefit.

Note that the sample sizes here used to include n_h = 20 and 50, but they disabled those because the extra runtime of the notebook to do that training breaks the grader. You can try it locally in your notebook, but make sure to comment out the higher numbers before you submit to the grader.

1 Like

I went ahead and ran the full set of sizes they used to use and here are the results I got:

Accuracy for 1 hidden units: 67.5 %
Accuracy for 2 hidden units: 67.25 %
Accuracy for 3 hidden units: 90.75 %
Accuracy for 4 hidden units: 90.5 %
Accuracy for 5 hidden units: 91.25 %
Accuracy for 20 hidden units: 90.75 %
Accuracy for 50 hidden units: 90.25 %

So you can see that 5 does give slightly better results and using 20 or 50 definitely does not help and takes a very long time to run the training.

Of course this is not a complete survey. You could also try 6, 7 and 8 to see how those work and you could also experiment with different learning rates and numbers of iterations.

1 Like

Thank you for the explanation and running the full set.

After looking at the results you shared, I reviewed my code and found an issue. After fixing the issue I can reproduce the same results.