Just finished the second assignment where 4-layer deep model improved accuracy for recognizing a cat photo to 80%. I subsequently ran the same program but on a 8 layer model
layers_dims = [12288, 30, 20, 15,10,7,5 ,1]
The accuracy dropped to 56%. I can understand why a larger model may not improve accuracy but why would it decrease it? I understand the number of layers is a hyperparameter that needs optimization but with so many hyperparameters that need optimization, deep learning seems more like voodoo magic than a logical framework. Will subsequent courses help to refine the approach?
To answer the big question first: yes, the topic of how to choose hyperparameters in a systematic way will be a major subject of Course 2 in this series, so please stay tuned for that. This is just the first course and there are simply too many topics to cover everything in the first course.
It’s great that you are extending the course material by trying your own experiments! That is a great way to increase your knowledge. You always learn something when you try that. In this case, you’re right that you would expect a deeper network to be able to represent a more complex function and do better on this task. But the issue is that a more complex network may also take longer to train. There is also no guarantee that the learning rate they use with the 4 layer network will give good convergence with your more complex network. So more experimentation is required before you can conclude that the bigger network didn’t work the way you would expect. Try doubling the number of iterations and then look at the cost values you are getting. Do they continue to decrease with more iterations? One other subtle point to make is that cost values are not comparable between different models. The cost is only useful with a given model to see if convergence is working or not. The real metric for performance is prediction accuracy.