I don’t quite understand the logic behind it, though i guessed it the 2nd time, the note with the answer did not help much, specifically f(x)=g(2x) part
I have not taken M4ML, but this looks like the Chain Rule in action. The chain rule says that if you have a function h defined as the composition of two other functions:
h(x) = f(g(x))
Then the derivative of h is:
h'(x) = f'(g(x)) * g'(x)
In this case, it looks like the functions in the composition are:
g(x) = 2x
f(x) = e^x
h(x) = e^{2x}
So what would you get in this case by applying the Chain Rule as shown above?
g′(x) = 2
f′(g(x)) = f′(2x) = e^2x
2*e^(2x)
you made it much easier for me to understand it.
thanks a lot
It takes a little while to get used to. I did maths to this level back in the day but it’s decades since I did this stuff, so I only remember odd bits here and there. I could remember the product rule but not this one.
It’s good to dust off your calculus knowledge here. The Chain Rule is absolutely central to everything we do here, because a Neural Network is just a huge function composition: at each layer, you have the linear activation followed by the non-linear activation. And then the output of layer one becomes the input to layer 2 and you get two more functions applied in sequence. And so forth. So you can think of forward propagation as composing functions like the layers of a big onion. Then in backward propagation, you take the gradients (derivatives) which is like peeling the onion one layer at a time and the Chain Rule is in action at every step of the way.