can you please explain me what is w1 [1] T , w2 [2]T , w3 [3] Tand w4 [4]T matrix… i am really confused
It sounds like you are asking about what is shown in this screenshot from the Week 3 lecture “Computing A Neural Network’s Output” at about 4:50 into that lecture:
What is happening there is that Prof Ng is starting with the individual equations for the output of each neuron and he uses the same format for everything that he does in the Logistic Regression case. So for example he shows:
z_1^{[1]} = w_1^{[1]T} \cdot x + b_1^{[1]}
Note that I’ve added one little “extra” there by making the dot product operation explicit between the w vector and the x vector.
So in that formulation, the vector w_1^{[1]} is the weight vector for the first neuron in the first layer (that’s the exponent [1] everywhere there). He formats w_1^{[1]} as a column vector, just as he did in the Logistic Regression case. It has dimension n_x x 1, where n_x is the number of input features (elements in each input x vector). x is also a column vector n_x x 1, so in order to get that dot product to work, we need to transpose the w vector. So dotting 1 x n_x with n_x x 1 gives you a 1 x 1 or scalar output.
Then what he does it to put all the weight vectors for the output neurons of layer 1 together into a single matrix, so that we can compute the outputs all at once in a vectorized way. But he also wants to make it simpler, so that we don’t need any more transposes on the whole W^{[1]} weight matrix. So he uses the w vectors in the transposed form, so that they are now row vectors 1 x n_x. That means he can stack them up as the rows of the weight matrix W^{[1]}. That’s what he is showing in the lower left section of that diagram.
So you end up with W^{[1]} having the dimensions n^{[1]} x n_x, where n^{[1]} is the number of output neurons in layer 1. And because of the fact that the w vectors from the upper right formulation are now the rows of W^{[1]}, the full vectorized forward propagation becomes:
Z^{[1]} = W^{[1]} \cdot X + b^{[1]}
Where X there is the full sample matrix with each column being one input vector. So if you have m samples, then Z^{[1]} is n^{[1]} x m. Then we apply the activation function “elementwise” to get A^{[1]} so it has the same dimensions as Z^{[1]}.
I didn’t mention the bias values there, but there is one scalar b value for each output neuron. In the final vectorized form, you also “stack” those into a column vector of dimension n^{[1]} x 1. So when you add that vector, it is “broadcast” and adds to each column of the output to compute the final Z^{[1]}.
Thanks a lot for the explanation @paulinpaloalto .
I have changed the code
W1 = np.random.rand(n_h, n_x)*0.01
b1 = (np.zeros((n_h,1)))
where n_h = n[1] and n_x = n[0]
So, the initialization has the following values for W1 when called for the test method
[[0.00435995 0.00025926 0.00549662]
[0.00435322 0.00420368 0.00330335]
[0.00204649 0.00619271 0.00299655]
[0.00266827 0.00621134 0.00529142]
[0.0013458 0.00513578 0.0018444 ]]
with
n_x → 3
n_h – > 5
n_y —> 2
But the expected value is
W1 = [[-0.00416758 -0.00056267]
[-0.02136196 0.01640271]
[-0.01793436 -0.00841747]
[ 0.00502881 -0.01245288]]
which is 2X4 array
I am getting the below error now
~/work/release/W3A1/public_tests.py in initialize_parameters_test(target)
57 assert parameters["b2"].shape == expected_output["b2"].shape, f"Wrong shape for b2."
58
---> 59 assert np.allclose(parameters["W1"], expected_output["W1"]), "Wrong values for W1"
60 assert np.allclose(parameters["b1"], expected_output["b1"]), "Wrong values for b1"
61 assert np.allclose(parameters["W2"], expected_output["W2"]), "Wrong values for W2"
AssertionError: Wrong values for W1
Hi Soumak,
I have already replied to this query on the other thread raised by you.
Thank you so much for your explanation! It sounds clear to me now!