Neural network and image recognition

I’m really enjoying the course. I have a question about the example of image recognition, and how forward propagation through the network operates. As presented it shows that the first layer of the network works out a rough idea on if the edges match its training, and then the next layer of the network assembles the edges into objects (ex eyes, nose, mouth), and then the next network after that generalizes the assembly of the objects are in the right places (ex, eyes above mouth.) This makes sense from a linguistic standpoint or an organizational standpoint, but how does this make sense from a mathematics and neural network standpoint?

What’s the evidence that this is really what happens in each of those layers?

1 Like

My impression of what Professor Ng says there is not that it is so precise as saying exactly what happens in each particular layer. Later in the course (in Week 4) there is a lecture that describes some work that instruments internal layers of ConvNets to discern what they are recognizing. It would be worth holding that thought until you get to Week 4 or you could just skip ahead and watch the lecture “What are Deep ConvNets Learning” as a preview and see whether it sheds light on your current question.

2 Likes

Yes there is evidence as Paul points out, it is possibile to visualize the features that each layers is extracting.

This can also be seen in the Tensorflow Advanced Techniques Specialization, I think its course 3 where they use the gradcam to check out which features that are being extracted in each layer of the image recognition convnet.

2 Likes