Why do we need to divided ||dθapprox - dθ|| by the lengths of these two vectors ||dθapprox||+||dθ||? what will happen if not do so?

This would be done to make the formula work for small and large values.

For example

Assume we want to measure difference between d1 and d2 (dThetaApprox, and dTheta here)

**Case 1**

d1 = 10

d2 = 110

The difference here is 100, but d2 is 10 Times d1. A huge difference

Check = ((110-10)^2)/((110)^2+10^2) = 0.81

**Case 2**

d1 = 1010

d2 = 1110

Difference is the same, 100. But the difference in terms of scale is much lesser.

Check = ((1110-1010)^2)/(1110^2+1010^2) = 0.004

This is just a scaling factor that takes into account the actual values of the data, and scales the difference such that we can make sense and compare two different check even when they are orders of magnitude apart.

**It can be roughly imagined as the percentage of difference.**

1 Like