Can any one explain the equation for grad_b1 and grad_b2. The lecture says
what is meant by step(Z1) and the matrix 1m
Can any one explain the equation for grad_b1 and grad_b2. The lecture says
what is meant by step(Z1) and the matrix 1m
Hi @Amit_Gairola1,
grad_b1
and grad_b2
are the gradients w.r.t. biases, i.e.
\frac{\partial J_{batch}}{\partial b_1} and \frac{\partial J_{batch}}{\partial b_2} respectively. The step
function is needed for the backward propagation through the ReLU
non-linearity. For every positive element of Z1 it should output one, and for other elements it should output zero. 1_m is a row vector containing m elements, all equal to 1. As you can see on the slide, the result of A \cdot 1^\top_m is equivalent to summing the elements of each row of the matrix A .