Float type noise

Hello community!

I was wondering if an error in Float calculations somehow affects the development of neural networks?

For example in Python 0.1+0.2 == 0.30000000000000004

If anyone has researched this question, I would be glad to hear about your experience!

It’s a well-known phenomenon, and there are methods to handle it.

For example, if you want to test if a float equals zero, you don’t use “==”. Instead you use the numpy.isclose() method.

2 Likes

Thanks!

When you operate in floating point, there are literally a finite number of numbers you can represent. In the case of 64 bit floating point, you have to cover the entire range of -\infty to +\infty with 2^{64} numbers. Now 2^{64} may seem like a “big number” just in terms of things you deal with on a daily basis. 2^{64} cans of tomato sauce would be a lot of tomato sauce. But it’s ridiculously small if you think about things from a mathematical perspective. In mathematics, there are even different sizes of infinity, right? Bigger and smaller infinities, if you can wrap your mind around that one. The number of integers is \aleph_0, which is strictly smaller than the number of real numbers \aleph_1. There are \aleph_1 infinite number of values between 0 and 1 or between 0.0001 and 0.0002 in mathematics.

When we run computer programs, we are forced to use finite representations like floating point, so we have to deal with the fact that everything we do is an approximation. We can’t even exactly represent something as simple as \frac {1}{3}. But it turns out there are ways to deal systematically with this set of problems: there is a whole subfield of mathematics called Numerical Analysis that specifically deals with this type of problem. If you study that, you will learn that there are ways to reason precisely about the error propagation properties of various algorithms. By using the techniques learned in Numerical Analysis it is possible to formulate algorithms that have the property that they are “numerically stable”, which means that the inevitable approximation errors (rounding errors) tend to statistically cancel out, as opposed to building up.

So the bottom line is, yes, you are pointing out a real issue here, but mathematicians have thought about this and the work has been done to make sure that the algorithms we run here give useful approximations to the values we would get if we could build a computer that could do arithmetic using \mathbb{R} instead.

1 Like

This answer is so clear! This is exactly what I was asking about. I’ve created custom data types in the past for other problems where I needed higher mathematical precision. Now I’m not ready to implement this knowledge in ML. However, after understanding deep learning better, I will return to this issue and now see that it can be useful!

1 Like

It’s great to hear that the information was relevant to your question. One other point is that it might be worth taking a look at the IEEE 754 specification for standard floating point. Computer implementations of floating point libraries like LINPACK are typically based on that standard. There are a number of different resolutions to choose from including even higher ones like binary128 and binary256. I’m not an expert in DL and everything I know about it are things that I’ve seen Prof Ng and his team do here in the various courses, but all the solutions we see here are implemented in either 32 bit or 64 bit floating point. Using one of the higher resolutions like 128 bits would obviously cost more in terms of memory, compute and storage resources and then the question would be whether the added resolution would actually buy you anything in terms of better quality solutions. My guess just from the fact that we never see them use higher than 64 bit resolution is that the answer is that the cost of higher resolutions is not justified by the results. Of course this is an experimental science: we could actually try using higher resolutions and then evaluate both the costs and benefits if any! :nerd_face:

1 Like