I have completed the first two courses in the machine learning specialization, and am almost done with the reinforcement learning section of the third course.
However, I still don’t understand how different neurons and layers focus on different parts of the input data.
For example, in the lesson Example: Recognizing Images, Dr. Ng states that in the first layer one neuron might look for lines oriented in one direction whereas another neuron might look for lines oriented in another direction. Dr. Ng goes on to say that in the second layer the neurons will focus on identifying parts of a face at a larger scale.
I understand the basics of forward and back propagation, and I understand how in a convolutional neural network convolutions can be used to detect edges.
However, I still don’t understand how the neurons and layers know how to specialize into focusing on different elements. For the neurons, I thought initially that if each neuron is created with different random weights and b values then that could help explain the specialization, but I thought I read somewhere else that you can initialize all neurons with the same weights.
If anyone can help me understand how the neurons and layers know to focus on different things I would really appreciate the help!
The neurons don’t know what sort of pattern they are looking for. The weight values are simply adjusted to minimize the cost.
If detecting an edge, or a line orientation, is useful to minimize the cost, that’s just how the weights will turn out. There’s no pre-determined assumption about what the neurons will learn.
Andrew is just giving some intuitive context for how you might understand what the NN is doing, and why it gives useful results.
I disagree that initializing neurons with the same set of weights guarantees neurons to capture different features. Please refer to this post for why neurons can differentiate. In the post, there is another link to a post with some mathematical explanations.
Thank you very much for the reply! That post you shared that shows the back propagation calculations is extremely helpful! I think that would be a great addition to the lecture slides!
Unfortunately, this is not something that we can consciously control. It is the magic of the math that decides all the pieces, so that they can all cohesively contribute to the output at the final layer.
We set the learning algorithm on a task to reduce the \frac {dJ} {dw}. We further allow it to apply the chain rule so that J can backpropogate to all the layers. From here on the math takes over. The end result being that different neurons learn different features (edges, parts of the image etc). In this manner, each neuron thereby contributes towards the creation of the final prediction.