Greetings!
What is the root cause of the difference between actually calculated and the expected results (see below)?
Thank you.
Greetings!
What is the root cause of the difference between actually calculated and the expected results (see below)?
Thank you.
Thatâs numerical accuracy differences, possibly due to slightly different implemenations. In practice, those two results are equivalent.
For example, if you use np.power() instead of **2, you will get slightly different results.
Got it.
How many decimal digits are safe to consider when comparing results?
Thereâs no hard answer.
It depends on the resolution of the data type. For example, 32-bit floats have higher resolution than 64-bit floats.
It also depends on the complexity of the calculations if youâre comparing two different methods. Every math operation can inject truncation and round-off errors.
This is a useful article on the concept.
If you read the very informative link that Tom gave, you get the following info:
For python float32
(which is equivalent to binary32
in the article as defined in the IEEE 754 spec), the resolution of the mantissa is of the order 10^{-7}, so 6 or 7 decimal places is the raw accuracy of the representation.
For python float64
(equivalent to binary64
) the resolution is of the order 10^{-16}, so 15 or 16 decimal places.
As Tom says, though, these are not hard and fast rules. In DL training, for example, we do incredibly big chained calculations so the errors that you can end up with may be bigger than the fundamental resolution of the datatype you are using: if youâre lucky, the computations are ânumerically stableâ and the rounding errors statistically cancel out in most cases. But sometimes you get unlucky and they accumulate instead of canceling out.
You can do some experiments and see this behavior. Suppose you want to compute the average of the values in a vector. It turns out that you get slightly different answers from
\displaystyle \frac {1}{m} \sum_{i = 1}^{m} v_i
and
\displaystyle \sum_{i = 1}^{m} \frac {1}{m} v_i
even though the two are equivalent mathematically. If you could do the calculations in \mathbb{R}, the answers would be the same, but we donât have that luxury. Hereâs an example:
np.random.seed(42)
m = 10000
a = np.random.randn(1,m)
z1 = np.sum(np.exp(a)/m)
z2 = 1/m * np.sum(np.exp(a))
z3 = np.mean(np.exp(a))
print(f"a.dtype {a.dtype}")
print("z1 = {:0.17f}\nz2 = {:0.17f}\nz3 = {:0.17f}".format(z1, z2, z3))
Which gives this:
a.dtype float64
z1 = 1.65292081707434368
z2 = 1.65292081707434346
z3 = 1.65292081707434346
You can see that they differ in the 16th decimal place.
@paulinpaloalto I particularly liked this reply Paul, and important reminder, when it comes to either computers/calculators, at best we are not exactly âdoing mathâ, but more sort of âestimatingâ to the best of our bit resolution.
There are ways around that, of course, but most applications donât even bother-- thus the world we live in is filled with âa slight tinge of errorâ.
Also, different subject, but in some ways related, it makes me think of a paper Karpathy recently highlighted: x.com
It turns out that the idea of approximating solutions of equations has been around for a lot longer than ML/DL/AI. It goes back at least as far as Isaac Newton, the inventor of calculus who first described the eponymous Newtonâs Method. It also arises naturally when we try to apply physics in the real world. In many cases, the behavior we care about is described by differential equations that do not have âclosed formâ solutions. So we have no choice but to approximate the solutions. Aerodynamics and orbital mechanics are important examples. Everything we do in floating point is just an approximation of the actual mathematics, but if done well it can be âclose enough for jazzâ. Weâve proven that you can land a spacecraft on Mars with floating point computations approximating the real mathematics.
Presuming, of course, that weâre using the same units of measure for planning and implementation:
Dear colleagues,
I enjoyed reading the great discussion above.
I see the root cause now. However, I think that it is worth always checking (or at least understanding) the effective number of meaningful digits and report the value cut by this boundary.
The folks who create the ML tutorials are often unfamiliar with the science and engineering concepts of numerical precision, numerical accuracy, and maintaining the correct number of significant figures.
@AKazak you will witness this even in courses. If you make a mistake on the levels we are talking about, your gradients will, even factorially, âoverfloatâ.