Quiz-Practical aspects of Deep Learning

Hi Community.

I have a question which may be I have because I am not getting clear on my head what the author says:

  • In the video “Basic Recipe for Machine Learning”, by minute 4:41, we can see that in order to solve the High Variance in the Dev/Test Set, one should have more data available. This might sound silly, but are we talking about having more data on the training set or on the dev/test set? If it is on the dev/test set, then why does the question in the Quizz “If your Neural Network model seems to have high variance, what of the following would be promising things to try?” gives the answer “Get more test data” wrong?

  • In the same video frame as above, another presented solution for reducing variance, is to “Try another Neural Architecture” (as said at 2:50min of the video). If this is the case, what does exactly mean to try another neural architecture? I ask this because what I was guessing was that it was refering to either add more neurons to each layer and/or add more layers to the neural network. But these options were wrong in the same question of the Quizz “If your Neural Network model seems to have high variance, what of the following would be promising things to try?”

Thank you in advance!


Hey @Ricardo_Gomes,

Here, we are talking about getting more data for the training set, and not for the dev/test set. In fact, let’s reason about it. Assume we have the same training set, and we keep on increasing the data in the dev set. Assuming the original errors were 10% (training) and 20% (cross-validation/dev). Now, after increasing the data in the dev set, the training error will remain the same, i.e., 10%, but increasing the data in the dev set, won’t it increase the error on the dev set further? The dev set now has more data, and possibly covers a wider span of the data distribution, and since the model is the same (as the training set is the same), hence the model will perform poorly on even a large number of examples, thereby increasing the dev set error, which essentially increases the variance. I guess that answers your first query.

Here, note that when Prof Andrew mentions Neural Network Architecture Search, he refers to finding a more suitable architecture, which includes both, finding among more complex architectures as well as finding among less complex architectures. Now, for reducing bias, moving towards a more complex architecture might help, since, the resulting model might be able to model the data better due to it’s increased sophistication. On the other hand, for reducing variance, moving towards a less complex architecture might be the way to go, since the dataset might be too simple for the model’s capability, and hence, reducing the model’s complexity might push the model to be at the level of the dataset. I hope this answers your second query.

Let me know if this helps.


Thank you for your clarification :v:

1 Like