Hey @Carl_Merrigan,

That’s an intriguing question. Quite frankly, I was stumped for sometime as to from where should I begin to explain from, but let’s start anyways. Before that, let’s state what we already know:

\frac{\partial{J}}{\partial{H}} = \frac{1}{m} W_2^T(\hat{Y} - Y), \\
\frac{\partial{Z_1}}{\partial{W_1}} = X, \\
\frac{\partial{J}}{\partial{W_1}} = \frac{\partial{J}}{\partial{H}} \frac{\partial{H}}{\partial{Z_1}} \frac{\partial{Z_1}}{\partial{W_1}}

Now, the thing in focus is \frac{\partial{H}}{\partial{Z_1}}, where, we already know that H = ReLU(Z_1). We know that:

Z_1 > 0; H = Z_1; \frac{\partial{H}}{\partial{Z_1}} = 1 \\
Z_1 <= 0; H = 0; \frac{\partial{H}}{\partial{Z_1}} = 0

Now, the above derivatives are conditioned on Z_1, but using simple deductions, I can re-write these derivatives conditioned on H, as belows:

H > 0; \frac{\partial{H}}{\partial{Z_1}} = 1 \\
H = 0; \frac{\partial{H}}{\partial{Z_1}} = 0

Now, as per ReLU’s definition, H can’t be less than 0, so, it won’t hurt, if I would change the second condition above from `H = 0`

to `H <= 0`

, so, re-writing the derivatives, we get:

H > 0; \frac{\partial{H}}{\partial{Z_1}} = 1 \\
H <= 0; \frac{\partial{H}}{\partial{Z_1}} = 0

Now, if we take a close look at the conditions above, we will find that the derivatives are based upon `ReLU(H)`

, where `H`

happens to be the output of `ReLU`

. Bringing upon an analogy of backward-propagation, we can state that \frac{\partial{J}}{\partial{H}} is the output of ReLU, and \frac{\partial{Z_1}}{\partial{W_1}} is the input. So, we can define the derivative \frac{\partial{H}}{\partial{Z_1}}, based on ReLU(\frac{\partial{J}}{\partial{H}}).

This is something which I guess could be one of the reasons, but I believe that there should be an easier and more intuitive explanation to this. Let me tag in another mentor to shine some light here. Hey @arvyzukai, can you please take a look at this thread, and let us know your take on this? Thanks in advance.

Cheers,

Elemento