W4_Forward and Backward Propagation - 2 ReLU's in a row?

I’m going through the 6th video, and in an example deep neural network, Dr. Ng draws this:

I know that this is just an off-handed example that was probably created on the fly, but I want to confirm my understanding that using 2 RelU activation functions in a row is equivalent to using just one? So in a real deep neural network, we would most likely not use this.

Hello @Apoorva_Dixit1, welcome to our community!

Yes! ReLU(ReLU(x)) is just ReLU(x).

However, this is NOT the message behind the slide. Each box in the slide represents a Dense layer, and writing “ReLU” in a box means that the Dense layer uses a ReLU activation. Therefore, the slide is talking about ReLU( W^{[2]} \times ReLU( W^{[1]} x+ b^{[1]}) + b^{[2]}).


But W^{[1]}x + b^{[1]} is also a linear transformation on X, right? So it’s basically 4 nested linear transformations. Which is equivalent to using RelU in one node?

We can look at this:

ReLU( W^{[2]} \times ReLU( W^{[1]} x+ b^{[1]}) + b^{[2]})

if the inner W^{[1]} x+ b^{[1]} is -1, and W^{[2]} is -1, and b^{[2]} is 0, then the whole expression is computed to ReLU(0) which is 0.

If we take away the inner ReLU, so it becomes ReLU( W^{[2]} \times ( W^{[1]} x+ b^{[1]}) + b^{[2]}), then the answer is changed to ReLU( -1 \times -1+ 0) which is 1.

So, if we take away the inner ReLU, we will be computing a different thing, so in this case, it is not equvialent.


1 Like