Why apply relu on L1 (L1 = np.dot(W2.T, (yhat - y))) instead oof on the activations?

Rounak_Shrestha · March 23, 2022, 12:58pm

What I understand is, the error on a layer is:

E = np.dot(weights_to_next_later, error_on_next_layer) * activation_grad(current_layer)

but on the week’s assignment for week 4 of the Probabilistic model, we apply relu on the dot product.
I understand that the gradient of relu is either 0 or 1 but that value must come from the activations from the hidden layer in this case.

Any thoughts?

reinoudbosch · June 13, 2022, 11:14pm

Hi Rounak_Shrestha,

This results from calculating the backpropagation for the CBOW model. You can for instance find such calculations here or here.

Topic		Replies	Views
Course 2 Week 4 Assignment ex 4 Backpropagation through ReLU using hidden vector h NLP with Probabilistic Models week-module-4	2	596	October 27, 2022
How does relu appears in first layer gradient of backpropagation? NLP with Probabilistic Models week-module-3	3	569	May 31, 2023
Activation function: ReLU NLP with Probabilistic Models week-module-4	4	166	April 21, 2024
Incorrect Backprop Equations NLP Course 2 Week 4 NLP with Probabilistic Models week-module-4	11	696	July 13, 2023
C1 W4: Building your Deep Neural Network: Step by Step Neural Networks and Deep Learning coursera-platform	5	489	October 7, 2023

Why apply relu on L1 (L1 = np.dot(W2.T, (yhat - y))) instead oof on the activations?

Related topics