DLS1 week 4 assignment 2: 2 hidden layers VS 4 - sharing views

Hi,

I have terminated all of the assignments of week 4 and I have a question about the last assignment.
We build a classifier for a 2 layers model and test it on a set of data, then we build another classifier function for a 4 layers model and compare how efficiently it classifies the same set of images. Reading through the assignment, I have the feeling it tries to tell us that a 4 layers is a better choice for accuracy and speed.
I have summarized some results in the following table:


I have also plotted the cost function VS #iteration for both models:

What I can draw from those tests is:

  • on the training set the 2 layers model performs slightly better (is it significantly better? not sure)
  • convergence is faster with the 2 layers model for the test used in the training set.
  • on the test set, the 4 layers performs better, but the size of the set is only 50.

Even if the 4 layers model performs better on the test set, I would not be too confident to say that the 4 layers model is a better option, with these training and test sets, and those hyperparameters (learning rate, iterations). I imagine this brings the question of what is the criteria of success: speed of the process or the accuracy.

Do you have more arguments that can help see that the 4 layers is a better option?
Thank you for sharing your views!

These are interesting and thoughtful questions. It’s great that you are going beyond just what is stated in the notebooks! We will learn a lot more about how to evaluate the performance of networks and decide what is “good enough” in Course 2 of this series, so definitely stay tuned for that. But here are some points to consider:

For starters, the training set performance is a mixed blessing. What we have here is a classic example of “overfitting” in both the 2 layer and 4 layer cases. Just as a general matter 99% training accuracy is not something to celebrate unless the testing accuracy is at a level very close to that. And in this case, the 2 layer net overfits much worse than the 4 layer net, so you could conclude that the 4 layer net is a somewhat more general solution. Overfitting and how to cope with it will be a major topic in Course 2.

But the other really important thing to emphasize is that the datasets we are working with here are unrealisticly small for a problem this difficult. It turns out they are very carefully curated to get pretty decent results, but in a “real world” application to recognize cat pictures or some similar image classification problem you’d need far more data samples to get good results. Even in something much simpler like the MNIST handwritten data recognition problem, the “real” dataset has 60k samples. Of course here we have just 209 training samples and 50 test samples. The small datasets are just an unfortunate consequence of the limitations of the online notebook environment that the course has to work with.

In terms of weighing the importance of training cost versus accuracy, my belief is that accuracy has to be considered the real point of all this, but it depends on what the goals of your system are. What are the costs of a wrong answer? If you’re just doing something where there are no serious consequences of a wrong answer (predicting what some browser user is going to click on next), then maybe accuracy is not such a big deal. But if you’re building a system in which a wrong answer has potentially serious consequences (e.g. giving a False Negative reading when analyzing a CT Scan for tumors), then you have to try for the highest accuracy that is possible, even if that is very costly in terms of the data storage and cpu/gpu cycles required to do the training. So I guess you could say that the real answer is “it depends”. :nerd_face:

2 Likes

Hi @paulinpaloalto,

Thank you for this enlightening answer! I understand that in a real-world case the sample size would be much bigger, but for the sake of the course, it had to be much smaller.
I cannot see why there is overfitting in both models right now, but I am keen to learn about it in the next course.
Thank you for sharing your views about training cost VS accuracy. As you mention, it would depend on each particular situation.

Looking forward to the next module,

Cheers!

1 Like

The definition of overfitting is that the model does very well on the training data (0.99 and 0.98 accuracy in our example here), but does significantly less well on the test data (0.72 and 0.80 accuracies in this case). In other words, the models are too specific to the training data and do not “generalize” well to any other cases. The test data represents the “real” inputs for which the performance of the model will be judged.

Please stay tuned and this will be a major topic in the first week of Course 2. Prof Ng will say a lot more about the various problems like overfitting and underfitting and how we can address them.

1 Like