Missing intution on tanh in back-prop - Programming assignment 2

Hi,

Though I have completed this assignment I went back and am still not quite getting the intuition here where it states in the notes:

Where exactly does 1 - a^2 come from ? I mean tanh is calculated as:

image

No ?

Perhaps it is just my algebra failing me a bit here, but even then why not just use np.tanh() ?

I guess I am just not seeing it…

The point is that the derivative of tanh is:

g'(z) = (1 - tanh^2(z))

And of course we have:

A^{[1]} = tanh(Z^{[1]})

So by substitution we have:

g^{[1]}{'}(Z1) = (1 - A1^2)

Note that it would be mathematically correct to also write it this way:

g^{[1]}{'}(Z1) = (1 - np.tanh(Z1)**2)

But unfortunately there is a bug in the test case: it turns out that they just generated random values for all the A and Z values so with the cache inputs we have:

A1 \neq tanh(Z1)

They computed the expected values for the test case using the formula they showed in the instructions. But note that there is a logical reason for using A1 there: tanh involves exponentials, so it’s expensive to compute. You already have the value saved from forward propagation, so why not just use it and save some compute?

If your actual question is why the derivative of tanh works out that way, we can go through that as well. But that’s a lot of LaTeX, so I will save that until you explicitly request it. I’m sure there are other threads that show that, but I can’t find one in a quick forum search. If you actually know a little calculus (the product rule, the exponent rule and the chain rule), you should be able to work it out yourself.

1 Like

A quick ChatGPT produce this tanh derivative calculation:

As
\tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}

Let y = \tanh(x). So,

y = \frac{e^x - e^{-x}}{e^x + e^{-x}}

Now, denote u = e^x and v = e^{-x}. Thus,
y = \frac{u - v}{u + v}

Then, using the quotient rule, the derivative of y with respect to x can be calculated as follows:

\frac{dy}{dx} = \frac{(v \cdot u' - u \cdot v')}{(v + u)^2}

Where ( u') and ( v' ) are the derivatives of ( u ) and ( v ) with respect to ( x), respectively.

u' = e^x
v' = -e^{-x}

Substitute these into the formula:

\frac{dy}{dx} = \frac{(e^{-x} \cdot e^x - e^x \cdot (-e^{-x}))}{(e^{-x} + e^x)^2}

\frac{dy}{dx} = \frac{(e^{x-x} + e^{x-x})}{(e^x + e^{-x})^2}

As e^0 = 1

\frac{dy}{dx} = \frac{1+1}{(e^x + e^{-x})^2}

\frac{dy}{dx} = \frac{2}{(e^x + e^{-x})^2}

Since {e^x + e^{-x}} = 2cosh(x), where cosh(x) is the hyperbolic cosine function, we can substitute this back:

\frac{dy}{dx} = \frac{2}{(2\cosh(x))^2}

\frac{dy}{dx} = \frac{1}{\cosh^2(x)}

\frac{dy}{dx} = \text{sech}^2(x)

And \text{sech}^2(x) is equal to 1 - \text{tanh}^2(x).

2 Likes

Cool! I always found the quotient rule a pain in the neck to remember, so I like using the product and exponent rule. Start by writing it this way:

g(z) = (e^z - e^{-z}) * (e^z + e^{-z})^{-1}

Now apply the product rule and the exponent rule and the chain rule:

g'(z) = (e^z + e^{-z}) * (e^z + e^{-z})^{-1} + (e^z - e^{-z}) * (-1) * (e^z + e^{-z})^{-2} * (e^z - e^{-z})

Rewrite that with fractions instead of negative exponents and you’re done:

g'(z) = \displaystyle \frac {(e^z + e^{-z})} {(e^z + e^{-z})} - \frac{(e^z - e^{-z})^2} {(e^z + e^{-z})^{2}}
g'(z) = 1 - \displaystyle \left ( \frac{e^z - e^{-z}} {e^z + e^{-z}} \right ) ^{2}

Of course I elided the chain rule calculations in the first product rule step. To show them individually:

\displaystyle \frac {d}{dz} (e^z - e^{-z}) = e^z - (-1) e^{-z} = e^z + e^{-z}
\displaystyle \frac {d}{dz} (e^z + e^{-z}) = e^z + (-1) e^{-z} = e^z - e^{-z}

2 Likes

Thanks Paul and Saif.

To be honest I forgot a little that when it comes to back prop it is derivatives, derivatives everywhere (!) – Though I just also wasn’t sure why this ‘note’ sort of ‘cut to the chase’ with the formula and why (or where was it starting from).

It is clear to me now.

** And I guess also because the activation function is g(Z)-- But ‘A’ is not part of Z, in fact it is the result of g(Z), so 1 - A^2 seemed at first to me circular.

Thanks.
-Anthony

The point is that this course is specifically designed not to require any calculus background. So you know from the general formula for back prop that you need the derivative of the activation function and they can’t assume you know how to figure it out yourself, so they just told you what it was.