- layer 0 is the input x which is a vector? of features.
- The NN consists of l layers
- Each layer has a different number of neurons (j)? Does it make sense to have the same number? So j is different for each layer
- Each neuron has parameters w and b where the size of w depends on the number of neurons in the previous layer? and for each neuron the size of 2 is the same but the values are different?
- Does that mean the for forward propagation w is actually a 3D array (Tensor?)
- b is a matrix (2D) of values for every neuron in each layer but the length is not the same since layers have different number of neurons.
- So what do we put in the missing places?
1 Like
Hello, @sdabach, my response in place in bold font.
1 Like
- Each neuron has parameters w and b where the size of w depends on the number of neurons in the previous layer? (Yes) and for each neuron the size of b is the same but the values are different?
-
- Does that mean the for forward propagation w is actually a 3D array (Tensor?) (No, w is 2 dimensions with the size of the first dimension being the number of neurons in the previous layer and the size of the second dimension being the number of neurons in the current layer.) So how do you store w for the different layers after the training? Do you create a parameter w_1…w_l for each layer? Same questions for b.
Yes, we have one w for each layer. Each w is a 2D array. We don’t stack them together to form a 3D array because they can have different shapes.
Let’s discuss that with an example.
Let’s say we have an input of 2 features and 3 layers of 5, 7, 1 neurons respectively, so we have the following weights shapes: (2, 5), (5, 7), and (7, 1) – all 2D arrays and can’t be simply stacked together into a 3D array.
The vanilla gradient descent formula is:
So, before and after a learning step, the shape of the weight remains 2D.
Cheers,
Raymond
1 Like
