Hi @vanooshe

Here is my explanation of the diagram:

Note that:

- step(z1) means 0 values where z1<0, and 1’s everywhere else;
- the final l1 value is the same as initial l1
*element-wise multiplied*(not*dot*product) with step(z1) (here the diagram could be clearer on that). This is implemented for you (`l1[z1 < 0] = 0`

) is equivalent to element-wise multiplication by 1s and 0s. This array is suggested to be used in further calculations (`# use "l1" to compute gradients below`

), it’s the biggest part in these calculations. - also note, that h here is not equal to l1 or z1, since h here is equal to relu(z1);
- also note, that you don’t have to dot multiply by 1^{T}_{m}, you can use
`np.sum(?, axis=?)`

since these are equivalent.

And since you commented on this thread I assume you saw the calculations that you can check your solution against.

Cheers