What I understand is, the error on a layer is:
E = np.dot(weights_to_next_later, error_on_next_layer) * activation_grad(current_layer)
but on the week’s assignment for week 4 of the Probabilistic model, we apply relu on the dot product.
I understand that the gradient of relu is either 0 or 1 but that value must come from the activations from the hidden layer in this case.
Any thoughts?