Derivation of DL/dz

Originally written by: Edward Shyu :raised_hands:

This is optional material that you can read after the week 2 video “Gradient descent on m examples.” You don’t need to know calculus in order to complete this course (or the other courses in the specialization), so this derivation is optional. This is for those who are curious about where the “dz = a - y” comes from.

This can be more fun and easier to digest if you follow along with a pencil and paper!

Derivation of \frac{dL}{dz}

If you’re curious, here is the derivation for \frac{dL}{dz} = a - y

Note that in this part of the course, Andrew refers to \frac{dL}{dz} as dz.

By the chain rule: \frac{dL}{dz} = \frac{dL}{da} \times \frac{da}{dz}

We’ll do the following: 1. solve for \frac{dL}{da}, then

Step 1: \frac{dL}{da}

L = -(y \times log(a) + (1-y) \times log(1-a))

\frac{dL}{da} = -y\times \frac{1}{a} - (1-y) \times \frac{1}{1-a}\times -1

We’re taking the derivative with respect to a.

Remember that there is an additional -1 in the last term when we take the derivative of (1-a) with respect to a (remember the Chain Rule). Also note that the notational conventions are different in the ML world than the math world: here log always means the natural log.

\frac{dL}{da} = \frac{-y}{a} + \frac{1-y}{1-a}

We’ll give both terms the same denominator:

\frac{dL}{da} = \frac{-y \times (1-a)}{a\times(1-a)} + \frac{a \times (1-y)}{a\times(1-a)}

Clean up the terms:

\frac{dL}{da} = \frac{-y + ay + a - ay}{a(1-a)}

So now we have:

\frac{dL}{da} = \frac{a - y}{a(1-a)}

Step 2: \frac{da}{dz}

\frac{da}{dz} = \frac{d}{dz} \sigma(z)

The derivative of a sigmoid has the form:

\frac{d}{dz}\sigma(z) = \sigma(z) \times (1 - \sigma(z))

You can look up why this derivation is of this form. For example, google “derivative of a sigmoid”, and you can see the derivation in detail.

Recall that \sigma(z) = a, because we defined “a”, the activation, as the output of the sigmoid activation function.

So we can substitute into the formula to get:

\frac{da}{dz} = a (1 - a)

Step 3: \frac{dL}{dz}

We’ll multiply step 1 and step 2 to get the result.

\frac{dL}{dz} = \frac{dL}{da} \times \frac{da}{dz}

From step 1: \frac{dL}{da} = \frac{a - y}{a(1-a)}

From step 2: \frac{da}{dz} = a (1 - a)

\frac{dL}{dz} = \frac{a - y}{a(1-a)} \times a (1 - a)

Notice that we can cancel factors to get this:

\frac{dL}{dz} = a - y

In Andrew’s notation, he’s referring to \frac{dL}{dz} as dz.

So in the videos:

dz = a - y

1191 Likes

<3 :slight_smile: =:0) ~~~~~~~~~~

28 Likes

The reason of the derivative of a sigmoid has the form \sigma(z)=\sigma * (1-\sigma(z)) is because de derivative of the sigmoid function is = \frac{e^{-z}}{(1+e^{-z})^2}, so using basic algebra we can find that property.

92 Likes

Clear representation. :slight_smile:

5 Likes

hope it helps!


215 Likes

what’s that \frac etc markup supposed to look like, and why is it showing up this way in this comment?

9 Likes

Hi, this convention is used in LaTeX, \frac{}{} represents a fraction, \sigma is the letter sigma of the greek alphabet. I’m sorry if I didn’t make myself understood.

7 Likes

No need to apologize, of course I figured it was LaTex, but I just wondered why it showed up as it did with the markup commands instead of math-like ?

6 Likes

Because they have not yet enabled the MathJax plugin for this Discourse instance. They are working on it, so stay tuned.

Update on 7/24/2021: MathJax is now enabled as of late June, so just bracket your LaTeX commands with a single $ on either side and it should render.

8 Likes

Thanks for the explanation. It was really helpful.

3 Likes

For those who are curious, here is a derivation for the derivative of the sigmoid function: Derivative of the Sigmoid function | by Arc | Towards Data Science

8 Likes


May be this will help to understand the derivation

21 Likes

6 Likes

derivation of sigmoid.

7 Likes

4 Likes

Can you explain why there is an additional -1? which Chain Rule is used to get that?
image

7 Likes

You are taking derivatives w.r.t. a, so the -1 comes from applying the Chain Rule to the expression (1 - a).

2 Likes

about da/dz=a(1-a):

26 Likes

Hi,
This is the complete solution:

8 Likes