When we set up the layer, we initialize its weights to some values, which are what we use for calculating the “activation” in the first round of gradient descent.
Sorry, I am still a little confused. So, when we set up a neural network model before we fit data in the model, there are some random weights. This is forward propagation. Then, we feed our data into the model and the model will do gradient descent to modify/learn the optimal parameters/weights. It is backward propagation.
I am still a little confused. It is unclear why each neuron gets different weights after the training. As far as I understand, when we start the forward propagation process for each cell, we begin with some values for vector w and the bias. We may select different initial weight values for each cell, but why do they end up with different weight values? I mean, they all use the same input values, and it is not clear to me why they may converge to different values even if the initial weights are different. Maybe I am missing something in the backward propagation process, or some dependencies between the computations of each cell?