Neural network with one hidden layer applied to cat dataset

I used the neural network of the one hidden layer and n-cells to identify “cat” or “not-cat” with the data of the first assignment. I discovered I can’t improve the accuracy by adding more cells in the hidden layer

What is the reason of that? why we can get an accuracy of 90% to identify the blue and red regions in the data of the flower, but not similar accuracy for the cat identification?

1 Like

Hi, Ruben.

It is great that you are trying experiments like this! You always learn something when you take the ideas in the course and apply them yourself.

At a high level, the answer is that not all problems are of equal difficulty. In the Planar Data case, we have literally 2 features to deal with, as opposed to 12288 features in the “see the cat” case. Just from an intuitive level, think about what it takes to see the geometries in the flower dataset, versus what your brain actually has to do to recognize a cat in a 64 x 64 RGB image. Big difference, right? Well, that applies to the neural networks, too. So there is no reason to expect that any two given image analysis problems will be solvable with the same network architecture. As with pretty much everything in this space, it all depends on how complex your problem actually is.

But with that said, there are also some things to check at the level of how you implemented your solution. Did you use the same network architecture as in the Planar Data case with tanh as the hidden layer activation? Since the data is more complex, you also may need quite a bit of experimentation with the learning rates and number of iterations to make sure you are actually getting equivalent levels of convergence in your model in the “cat” case. You can’t just apply the same learning rate and number of iterations as in the Planar Data case and assume that will give you an equivalent quality of solution. It sounds like you did some experimentation with the number of neurons in the hidden layer, which is probably the most key “hyperparameter” here (that’s the term Prof Ng uses for attributes that you simply have to choose, as opposed to “parameters” which can be learned through back propagation). How much difference in prediction accuracy did you see as you varied the number of hidden layer nodes? And when you quote accuracy values, are you using the training set or the test set or both? Note that the Planar Data assignment is anomalous in that there is no concept of “test data” here: there’s just the one dataset.

But with all the above said, it’s probably worth it to just “hold that thought” and cruise ahead to Week 4. There Prof Ng shows us how to build a fully general neural network and then apply it to the cat recognition problem with the same Week 2 dataset. In the second assignment, you’ll get to try a 2 layer net and a 4 layer net on the problem and see how well they do. Then you can take the code and run your own experiments to see if you can get even better performance. Onwards! :nerd_face:

2 Likes