Hi @vanooshe
Here is my explanation of the diagram:
Note that:
- step(z1) means 0 values where z1<0, and 1’s everywhere else;
- the final l1 value is the same as initial l1 element-wise multiplied (not dot product) with step(z1) (here the diagram could be clearer on that). This is implemented for you (
l1[z1 < 0] = 0
) is equivalent to element-wise multiplication by 1s and 0s. This array is suggested to be used in further calculations (# use "l1" to compute gradients below
), it’s the biggest part in these calculations. - also note, that h here is not equal to l1 or z1, since h here is equal to relu(z1);
- also note, that you don’t have to dot multiply by 1^{T}_{m}, you can use
np.sum(?, axis=?)
since these are equivalent.
And since you commented on this thread I assume you saw the calculations that you can check your solution against.
Cheers