Hi, I was wondering if someone could explain to me where did these derivate formulas come from?

Hi Amir,

Welcome to the community!

What Prof Ng has tried to state through these expressions is, here the dZ means a partial derivative of L w.r.t Z, so this is just a kind of chain rule in action.

We have our formulas as,

L(y,a) = -ylog(a) - (1-y)log(1-a)

a = sigma(z)

If we talk about the full chain rule expression here, then we will have,

Finally, we can have our full chain rule as:

Note: partial z/partial w1 = x1

Hello!

Thank you for your answer.

I just can’t understand where does w[2] come from in the formula for dz[1] (partial z for the hidden layer)?

For the logistic regression we had: dz = a - y. For the output layer it’s also so: dz[2] = a[2] - y, where y, I guess, will be the actual values from the training set.

I’m wondering what will play the role of y in the case of the hidden layer?

Thanks in advance!

Since ‘y’ is only applicable to the output layer, it doesn’t play a direct role in the hidden layers.

Only the errors from the output layers are passed back to the hidden layers.

Ok, thank you for the answer!

So, is dz[1] for the hidden layer somehow expressed as a function of weights of the output layer w2?

Could you please show me how is it derived: dz[1] = w[2]T dz[2] * g[1]’ (z[1])?

Tried myself without success

Sorry, I can’t show that, I am not very good at differential calculus.