A lecture issue in dropout regularization implementation in week 1

It’s great that the discussion was useful. Thanks for confirming!

Another followup question that has come up before is whether the fact that dropout works in a given case means that there is actually a smaller network that we could have started with and trained without dropout that would have achieved the same “Goldilocks” balance between fitting the test data and not overfitting the training data. I don’t definitively know the answer, but it seems likely that this is true. The problem is that finding it is not as practical as using dropout or other forms of regularization. Here’s a thread from a while ago that discusses this point in more detail.