C1_W2_Lab02: result difference in 4

Greetings!

What is the root cause of the difference between actually calculated and the expected results (see below)?

Thank you.

That’s numerical accuracy differences, possibly due to slightly different implemenations. In practice, those two results are equivalent.

For example, if you use np.power() instead of **2, you will get slightly different results.

Got it.
How many decimal digits are safe to consider when comparing results?

There’s no hard answer.

It depends on the resolution of the data type. For example, 32-bit floats have higher resolution than 64-bit floats.

It also depends on the complexity of the calculations if you’re comparing two different methods. Every math operation can inject truncation and round-off errors.

This is a useful article on the concept.

If you read the very informative link that Tom gave, you get the following info:

For python float32 (which is equivalent to binary32 in the article as defined in the IEEE 754 spec), the resolution of the mantissa is of the order 10^{-7}, so 6 or 7 decimal places is the raw accuracy of the representation.

For python float64 (equivalent to binary64) the resolution is of the order 10^{-16}, so 15 or 16 decimal places.

As Tom says, though, these are not hard and fast rules. In DL training, for example, we do incredibly big chained calculations so the errors that you can end up with may be bigger than the fundamental resolution of the datatype you are using: if you’re lucky, the computations are “numerically stable” and the rounding errors statistically cancel out in most cases. But sometimes you get unlucky and they accumulate instead of canceling out.

You can do some experiments and see this behavior. Suppose you want to compute the average of the values in a vector. It turns out that you get slightly different answers from

\displaystyle \frac {1}{m} \sum_{i = 1}^{m} v_i

and

\displaystyle \sum_{i = 1}^{m} \frac {1}{m} v_i

even though the two are equivalent mathematically. If you could do the calculations in \mathbb{R}, the answers would be the same, but we don’t have that luxury. Here’s an example:

np.random.seed(42)
m = 10000
a = np.random.randn(1,m)
z1 = np.sum(np.exp(a)/m)
z2 = 1/m * np.sum(np.exp(a))
z3 = np.mean(np.exp(a))
print(f"a.dtype {a.dtype}")
print("z1 = {:0.17f}\nz2 = {:0.17f}\nz3 = {:0.17f}".format(z1, z2, z3))

Which gives this:

a.dtype float64
z1 = 1.65292081707434368
z2 = 1.65292081707434346
z3 = 1.65292081707434346

You can see that they differ in the 16th decimal place.

2 Likes

@paulinpaloalto I particularly liked this reply Paul, and important reminder, when it comes to either computers/calculators, at best we are not exactly ‘doing math’, but more sort of ‘estimating’ to the best of our bit resolution.

There are ways around that, of course, but most applications don’t even bother-- thus the world we live in is filled with ‘a slight tinge of error’.

Also, different subject, but in some ways related, it makes me think of a paper Karpathy recently highlighted: x.com

1 Like

It turns out that the idea of approximating solutions of equations has been around for a lot longer than ML/DL/AI. It goes back at least as far as Isaac Newton, the inventor of calculus who first described the eponymous Newton’s Method. It also arises naturally when we try to apply physics in the real world. In many cases, the behavior we care about is described by differential equations that do not have “closed form” solutions. So we have no choice but to approximate the solutions. Aerodynamics and orbital mechanics are important examples. Everything we do in floating point is just an approximation of the actual mathematics, but if done well it can be “close enough for jazz”. We’ve proven that you can land a spacecraft on Mars with floating point computations approximating the real mathematics.

2 Likes

Presuming, of course, that we’re using the same units of measure for planning and implementation:

1 Like

Dear colleagues,

I enjoyed reading the great discussion above.

I see the root cause now. However, I think that it is worth always checking (or at least understanding) the effective number of meaningful digits and report the value cut by this boundary.

The folks who create the ML tutorials are often unfamiliar with the science and engineering concepts of numerical precision, numerical accuracy, and maintaining the correct number of significant figures.

1 Like

@AKazak you will witness this even in courses. If you make a mistake on the levels we are talking about, your gradients will, even factorially, ‘overfloat’.