Numerical Approximation

abdou_brk · September 18, 2024, 2:37pm

Hello,
I have a question about the numerical approximation and the examples prof andrew gave. I don’t understand why we take J(w+epsilon)-J(w-epsilon) rather than J(w+epsilon)-J(w) because normally the derivative is defined mathematically as the limit of J(w+epsilon)-J(w)/epsilon (epsilon–>0). I think it makes more sense than J(w+epsilon)-J(w-epsilon)/2*epsilon,didn’t it?

paulinpaloalto · September 18, 2024, 2:46pm

It’s been a while since I watched these lectures, but I’m pretty sure Andrew comments on that in the lectures. It’s the difference between a “one sided” difference and a “two sided” difference. When we are doing “real” calculus using \mathbb{R} and are taking limits as \Delta x \rightarrow 0, we don’t really have to worry because we have infinite resolution. But when we’re operating in the limited world of 64 bit floating point and have literally only 2^{64} numbers that we can represent between -\infty and +\infty, then we have to worry about this. It just turns out that when you are estimating derivatives with finite differences, the two sided differences give you better convergence behavior. This turns out to matter a lot, because packages like TF and PyTorch use numerical approximation of gradients extensively, so this behavior has been studied carefully.

Here’s a thread with a relevant question from a different course, but which shows some experiments that demonstrate the behavior. You can probably also find articles on Wolfram or MathWorks by googling “two-sided difference”.

abdou_brk · September 18, 2024, 2:49pm

Got it, thank you !

paulinpaloalto · September 18, 2024, 2:59pm

Yes, Prof Ng does discuss this point starting about about 1:30 in the first lecture on this in DLS C2 W1.

Topic		Replies	Views
Course 2 -- Week 1 -- Numerical approximation of gradients Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	562	June 28, 2021
Grad check threshold Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	587	April 20, 2021
Gradient Checking explaination requested Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	452	June 14, 2023
Gradient approximation with bilateral triangle Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	528	October 9, 2022
Numerical Approximation - A small typo Improving Deep Neural Networks: Hyperparameter tun week-module-1 , coursera-platform	4	322	January 21, 2024

Numerical Approximation

Related topics