Gradient Checking Normalization

CCCC · August 23, 2021, 6:10pm

Why do we need to divided ||dθapprox - dθ|| by the lengths of these two vectors ||dθapprox||+||dθ||? what will happen if not do so?

Caleb · August 23, 2021, 10:00pm

This would be done to make the formula work for small and large values.

For example
Assume we want to measure difference between d1 and d2 (dThetaApprox, and dTheta here)

Case 1
d1 = 10
d2 = 110
The difference here is 100, but d2 is 10 Times d1. A huge difference
Check = ((110-10)^2)/((110)^2+10^2) = 0.81

Case 2
d1 = 1010
d2 = 1110
Difference is the same, 100. But the difference in terms of scale is much lesser.
Check = ((1110-1010)^2)/(1110^2+1010^2) = 0.004

This is just a scaling factor that takes into account the actual values of the data, and scales the difference such that we can make sense and compare two different check even when they are orders of magnitude apart.

It can be roughly imagined as the percentage of difference.

Topic		Replies	Views
Gradient Checking ____ Euclidean distance Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	451	July 17, 2023
A doubt on gradient checking Improving Deep Neural Networks: Hyperparameter tun coursera-platform	6	427	August 21, 2023
Normalize question from gradient checking slide Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	487	October 14, 2022
DLS Course 2 Week 1 - Gradient Checking Implementation Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	528	October 17, 2022
C2W1 - Theory behind Gradient Checking formula? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	11	557	August 7, 2023

Gradient Checking Normalization

Related topics