C2W1 Individual Neurons and Classification

rmwkwok · June 19, 2022, 2:42am

Hello @jimming , it’s a great question. In short, neurons differentiated after training, because they are different at the beginning – they have different initial parameter values. In other words, you can make sure they do not differentiate by setting some neuron parameters to be the same at initial. I will show you how at the end.

Let’s explain why they can differentiate with a setting of a layer of 2 neurons, followed by a layer of 1 neuron, as illustrated by the following graph.

From the left, the input has 2 features, and as you said, a copy of them is sent to each of the 2 neurons in the 1st layer. The internal work of each neuron is shown by the 2 maths equation in each of the neuron which you may have been familiar with after W1. The outputs a_1, a_2 are fed to the 2nd layer, same internal work and then producing a_3 for calculating the loss.

Above is the forward phase. Next is the key - the backward propagation phase, because our question is how can neuron weights (w_1, w_2, …) change differently, and such difference is soley determined by the gradient (\frac{\partial{J}}{\partial{w_1}}, …), because this is the rule of how a weight is updated: w_1 := w_1 - \alpha\frac{\partial{J}}{\partial{w_1}}.

So we can’t avoid it to look at \frac{\partial{J}}{\partial{w_1}}, by chain rule, which can easily thought of as multiplying a chain of gradients tracing back from J to w_1, and so we have:

OK, how to read the chain rule for \frac{\partial{J}}{\partial{w_1}}? J depends on a_3 which depends on z_3 which depends on a_1 which … until … which depends on w_1. You can follow such chain in the network graph at the top from back to forth.

Here I specifically calculated \frac{\partial{z_3}}{\partial{a_1}} and \frac{\partial{z_1}}{\partial{w_1}} (which are w_5 and x_1 respectively), because if you compare the same calculation of other gradients, they are the reason why the gradients are different! Can you see that?

For example,
\frac{\partial{J}}{\partial{w_1}} and \frac{\partial{J}}{\partial{w_3}} are different because w_5 and w_6 are different.

\frac{\partial{J}}{\partial{w_1}} and \frac{\partial{J}}{\partial{w_2}} are different because x_1 and x_2 are different.

Since the gradients are different, given that w_? := w_? - \alpha\frac{\partial{J}}{\partial{w_?}}, the weights are updated differently!!

Now, as I promised at beginning, here is a way to make sure those weights can’t differentiate, as you may already notice, you only need to make sure something like w_5 = w_6, below is a code snippet for you to achieve that:

note that w_1 = w_3 and w_2 = w_4 before and after the training, given this, the two neurons in the first layer do not differentiate!

P.S. I dropped bias term in the above discussion to make it simpler. But the idea does not change even including the bias term.

Topic		Replies	Views
What makes different neurons calculate different parameters within a layer? Advanced Learning Algorithms week-module-1	21	1170	July 8, 2024
How do units within the same layer end up with different weights? Advanced Learning Algorithms week-module-2	3	721	July 28, 2022
Understanding how Neural Networks Learn Advanced Learning Algorithms week-module-1	7	571	February 16, 2023
What is the effect random initialization of W on multiple nodes and in neural network when all of them are doing the same thing Neural Networks and Deep Learning week-module-3 , coursera-platform	6	184	May 24, 2024
Hidden layer first iteration neural network Neural Networks and Deep Learning coursera-platform	2	661	January 18, 2022

C2W1 Individual Neurons and Classification

Related topics