DLS Course 2 - Week 1 - Checking your derivative computation

Hello,
This is my first post, so I take opportunity to thank all deepllearning.ai team for this great materials.

My question is concerning the slide “Checking your derivative computation” and the lim calculation and the justification behind saying that lim[ f(x+h)−f(x)/h] when h-> 0 is O(h) and lim[ f(x+h)−f(x-h)/2h] when h->0 is O(h2)

As in the first estimation, f(x+h)-f(x), from the taylor expansion f(x+h)=f(x)+f′(x)h+(1/2)f′′(x)h2+O(h3) we can rewrite also that [f(x+h)−f(x)/h] is O(h2). So any justification on why the estimation is different in both cases ?
Thank you

Hey @atriki,
Welcome to the community. I believe there is a small confusion regarding the below statement.

Prof Andrew is not exactly saying the above thing. Instead, he says that when we use the formula

lim _ {(x \rightarrow h)} \frac{f(\theta + \epsilon) + f(\theta - \epsilon)}{2 * \epsilon}

then, the error in the estimation of the gradient is of the order O(\epsilon ^ 2), but when we use the other formula, i.e.,

lim _ {(x \rightarrow h)} \frac{f(\theta + \epsilon) + f(\theta)}{\epsilon}

then, the error in the estimation of the gradient is of the order O(\epsilon). I hope this makes sense, since Prof Andrew demonstrated this explicitly in the lecture videos. If not, then please review the lecture videos once again.

Let us know if this helps.

Cheers,
Elemento

Hi Elemento,
thank you for your clarification.
Then is it possible please to have some some additional explanation (more mathematically) on why this difference in calculation to find the O(teta) and O(teta^2) ? Not only through specifc numbers calculations as explained during the course.
Thank you
Cheers,
Amine

Hey @atriki,
I guess there is still a small confusion. We don’t want the errors to be of the order O(\epsilon) and O(\epsilon^2), i.e., this is not something that we are aiming for. We are just using the different formulations to calculate the derivatives, and when we use these different formulations, these are the orders of the errors that we observe. I hope this helps.

Cheers,
Elemento

I would state it a bit differently: we’re trying to find the best algorithm for doing “finite difference” approximations to gradients, so we can use that to confirm that our gradient logic is correct. If you have a choice between a model that gives errors with O(\epsilon) and a model that gives errors of O(\epsilon^2), then it’s clear which one is superior, right? Remember that the whole point is that \epsilon is very small. What happens if you square 10^{-7}? Wait for it … it gets a lot smaller. :nerd_face:

Now I realize that doesn’t really answer the fundamental question originally asked here, which is why the two different ways to compute finite difference derivative estimates have those error behaviors. That requires some math, but if you already know about Taylor Expansion you should be set for that. Here’s a document from Brown University which uses Taylor Series to show the behavior of one-sided and two-sided finite differences.

1 Like

Note that I did an edit to my previous reply to include a link to the derivation of why the two-sided finite differences behave better. Please have a look and see if that answers your question.

1 Like

Thanks a lot @paulinpaloalto Sir, for correcting my explanation, and the source that you have mentioned has indeed make this concept crystal-clear. As per the source, we can use other formulations as well, for making the error estimates even smaller.

Cheers,
Elemento

Thank you for the link and the answer. This is exactly what I tried to understand/find using the Taylor series expansions.