C1_W2_Lab02: result difference in 4

AKazak · November 23, 2024, 7:43am

Greetings!

What is the root cause of the difference between actually calculated and the expected results (see below)?

Thank you.

TMosh · November 23, 2024, 8:48am

That’s numerical accuracy differences, possibly due to slightly different implemenations. In practice, those two results are equivalent.

For example, if you use np.power() instead of **2, you will get slightly different results.

AKazak · November 23, 2024, 9:02am

Got it.
How many decimal digits are safe to consider when comparing results?

TMosh · November 23, 2024, 7:05pm

There’s no hard answer.

It depends on the resolution of the data type. For example, 32-bit floats have higher resolution than 64-bit floats.

It also depends on the complexity of the calculations if you’re comparing two different methods. Every math operation can inject truncation and round-off errors.

This is a useful article on the concept.

paulinpaloalto · November 23, 2024, 7:42pm

If you read the very informative link that Tom gave, you get the following info:

For python float32 (which is equivalent to binary32 in the article as defined in the IEEE 754 spec), the resolution of the mantissa is of the order 10^{-7}, so 6 or 7 decimal places is the raw accuracy of the representation.

For python float64 (equivalent to binary64) the resolution is of the order 10^{-16}, so 15 or 16 decimal places.

As Tom says, though, these are not hard and fast rules. In DL training, for example, we do incredibly big chained calculations so the errors that you can end up with may be bigger than the fundamental resolution of the datatype you are using: if you’re lucky, the computations are “numerically stable” and the rounding errors statistically cancel out in most cases. But sometimes you get unlucky and they accumulate instead of canceling out.

You can do some experiments and see this behavior. Suppose you want to compute the average of the values in a vector. It turns out that you get slightly different answers from

\displaystyle \frac {1}{m} \sum_{i = 1}^{m} v_i

and

\displaystyle \sum_{i = 1}^{m} \frac {1}{m} v_i

even though the two are equivalent mathematically. If you could do the calculations in \mathbb{R}, the answers would be the same, but we don’t have that luxury. Here’s an example:

np.random.seed(42)
m = 10000
a = np.random.randn(1,m)
z1 = np.sum(np.exp(a)/m)
z2 = 1/m * np.sum(np.exp(a))
z3 = np.mean(np.exp(a))
print(f"a.dtype {a.dtype}")
print("z1 = {:0.17f}\nz2 = {:0.17f}\nz3 = {:0.17f}".format(z1, z2, z3))

Which gives this:

a.dtype float64
z1 = 1.65292081707434368
z2 = 1.65292081707434346
z3 = 1.65292081707434346

You can see that they differ in the 16th decimal place.

Nevermnd · November 23, 2024, 11:59pm

@paulinpaloalto I particularly liked this reply Paul, and important reminder, when it comes to either computers/calculators, at best we are not exactly ‘doing math’, but more sort of ‘estimating’ to the best of our bit resolution.

There are ways around that, of course, but most applications don’t even bother-- thus the world we live in is filled with ‘a slight tinge of error’.

Also, different subject, but in some ways related, it makes me think of a paper Karpathy recently highlighted: x.com

paulinpaloalto · November 24, 2024, 4:39am

It turns out that the idea of approximating solutions of equations has been around for a lot longer than ML/DL/AI. It goes back at least as far as Isaac Newton, the inventor of calculus who first described the eponymous Newton’s Method. It also arises naturally when we try to apply physics in the real world. In many cases, the behavior we care about is described by differential equations that do not have “closed form” solutions. So we have no choice but to approximate the solutions. Aerodynamics and orbital mechanics are important examples. Everything we do in floating point is just an approximation of the actual mathematics, but if done well it can be “close enough for jazz”. We’ve proven that you can land a spacecraft on Mars with floating point computations approximating the real mathematics.

TMosh · November 24, 2024, 4:42am

Presuming, of course, that we’re using the same units of measure for planning and implementation:

AKazak · November 24, 2024, 4:16pm

Dear colleagues,

I enjoyed reading the great discussion above.

I see the root cause now. However, I think that it is worth always checking (or at least understanding) the effective number of meaningful digits and report the value cut by this boundary.

TMosh · November 24, 2024, 5:42pm

The folks who create the ML tutorials are often unfamiliar with the science and engineering concepts of numerical precision, numerical accuracy, and maintaining the correct number of significant figures.

Nevermnd · November 24, 2024, 10:25pm

@AKazak you will witness this even in courses. If you make a mistake on the levels we are talking about, your gradients will, even factorially, ‘overfloat’.

Topic		Replies	Views
C2_W2_Derivatives Advanced Learning Algorithms week-2	9	382	April 5, 2024
C1_W1_Assignment 1: Exercise 4 different model accuracy NLP with Classification and Vector Spaces week-1	7	22	May 19, 2025
Exercise 4 - Forward Propagation Calculus for Machine Learning and Data Science week-3	4	459	April 11, 2023
Float type noise AI Discussions ai-discussions	5	74	March 12, 2024
C1_W3_Logistic_Regression_error Supervised ML: Regression and Classification week-3	4	24	April 26, 2025

C1_W2_Lab02: result difference in 4

Related topics