Activation function: ReLU

ahmedemam · April 20, 2024, 5:09pm

When creating a post, please add:

Week 4
*Coursera | Online Courses & Credentials From Top Educators. Join for Free | Coursera
why was l1[z1<0] = 0 and not just taking taking max between 0 and l1 value itself not the z1, what’s the idea here? For context l1 is W2.T x yhat-y, and z is W1 x X + bias1

TMosh · April 21, 2024, 12:30am

Sorry, I don’t understand your question.

Maybe it would help if you posted a screen capture image from one of the lectures or labs.

jyadav202 · April 21, 2024, 7:48am

Hi!
l1[z1<0] is a step function implementing ReLu derivative and gradient multiplication.
What I mean is that during the calculation of gradients using chain rule, we arrive at point when we have to perform :

\begin{align} Gradient * Derv(activation func) &= W_2^T(\hat y - y) * Relu'(Z_1) \\ &= \begin{cases} W_2^T(\hat y - y)*1 , & \text{if } Z_1 > 0\\ W_2^T(\hat y - y)*0 , & \text{if } Z_1 < 0\\ \end{cases} \end{align}

In python we can implement the above condition using a step function l1.step(z1) or l1[z1<0] = 0

jyadav202 · April 21, 2024, 9:20am

Also note that we are taking gradient of Relu. In your question, perhaps you understood that we have to apply Relu.

ahmedemam · April 21, 2024, 10:09am

Oh this makes sense now thank you.

Topic		Replies	Views
How does relu appears in first layer gradient of backpropagation? NLP with Probabilistic Models week-module-3	3	582	May 31, 2023
Course 2 Week 4 Assignment ex 4 Backpropagation through ReLU using hidden vector h NLP with Probabilistic Models week-module-4	2	607	October 27, 2022
Why apply relu on L1 (L1 = np.dot(W2.T, (yhat - y))) instead oof on the activations? NLP with Probabilistic Models week-module-4	1	562	June 13, 2022
NLP C2W4: w4_unittest error (back_prop) NLP with Probabilistic Models week-module-2	3	411	March 25, 2024
C2_W4 - UNQC4 Step Function clarification NLP with Probabilistic Models week-module-4	3	416	January 16, 2024

Activation function: ReLU

Related topics