Question about backprop

bupeigon · September 23, 2024, 4:11pm

When we proceed backprop to calculate derivatives, it seems the whole process would produce some round off errors and be less precise , why can we ignore the influence of that ?

paulinpaloalto · September 23, 2024, 6:41pm

It’s an interesting and perceptive question. Everything we do here (forward propagation in addition to back prop) is in floating point arithmetic, which is a finite representation by definition. We have literally only 2^{32} or 2^{64} numbers that we can represent between -\infty and +\infty, depending on whether we use 32 bit or 64 bit floats. That is completely pathetic compared to the abstract beauty of \mathbb{R}, but we don’t have a choice. There is no efficient way for computer software to deal with the uncountably infinite properties of the pure math version of all this.

It turns out that mathematicians have thought carefully about the issues here. There is an entire subfield of mathematics called Numerical Analysis that deals with finite representations, among other things. It turns out you can reason precisely about the error propagation properties of different algorithms, when they are subject to rounding errors. You can have “numerically stable” computations in which the rounding errors roughly cancel out on a statistical basis and are bounded or you can have “unstable” computations in which the rounding errors tend to compound and become unbounded. So the algorithms we are using have been carefully chosen to be the type that are “stable”. That means that, yes, we will always have rounding errors, but they don’t prevent us from finding valid solutions.

TMosh · September 23, 2024, 7:21pm

I think because the errors induced are extremely small compared to the magnitude of the features.

Nevermnd · September 23, 2024, 7:52pm

@paulinpaloalto @TMosh @bupeigon I don’t claim to be an expert on this, but I also have read some papers lately where they are driving things down to 8-bit (or even lower) for the sake of expediency.

*If I had to take a wild guess, I think the magnitude of the models in play silences out the error (similar to what @TMosh says).

paulinpaloalto · September 23, 2024, 7:56pm

Do you have a link to any such papers that we could peruse?

I gave the answer above: everything we do is an approximation. If your algorithms are properly designed, then the rounding errors don’t compound and don’t end up preventing us from getting an approximate solution that is “close enough for jazz” and works. My hunch is that whatever you are referring to here about dealing in 8 bit spaces is addressing something else.

It has been demonstrated that you can land a spacecraft on Mars with 64 bit floating point. Close enough for government work.

Nevermnd · September 23, 2024, 8:02pm

Lemme look. I think this is in the context of LLMs. Personally Paul, since I know hardware-- I’ve always wanted to spin one of these things in pure analog. Not there yet

Nevermnd · September 23, 2024, 8:03pm

@paulinpaloalto not the most definitive. Just a quick search:

https://www.eetimes.com/ibm-brings-8-bit-ai-training-to-hardware/

TMosh · September 23, 2024, 8:37pm

Previously, I’ve had the “experience” of doing floating point math in assembly language on a machine that only supported 8-bit integers. It was not a lot of of fun.

I greatly enjoy the current crop of high-level programming tools and math libraries.

Nevermnd · September 23, 2024, 8:46pm

@TMosh actually what inspired me to get back into this whole mess was working with real world machines or physical hardware… You have to ‘bounce switches’ and all sorts of crazy stuff. Plus you have to deal with race conditions and how the signal is running around the board.

In contrast, working on a desktop today is ‘pretty easy’.

TMosh · September 23, 2024, 8:54pm

With enough memory and a fast enough processor and high-level programming tools, software becomes all too easy.

Nevermnd · September 23, 2024, 9:03pm

@TMosh Apocryphal: But 640k should be enough for everybody, am I right ?

TMosh · September 23, 2024, 10:50pm

Indeed.

Topic		Replies	Views
Float type noise AI Discussions ai-discussions	5	84	March 12, 2024
Final exam: Losing 5 points because of the 17th decimal accuracy Calculus for Machine Learning and Data Science week-module-3	3	593	March 16, 2023
[DLS1] Week 3 - exercise 6: error by 1e-8 Neural Networks and Deep Learning coursera-platform	4	695	July 2, 2022
C1_W2_Lab02: result difference in 4 Supervised ML: Regression and Classification week-module-2	10	64	November 24, 2024
Numerical round off errors in Python Advanced Learning Algorithms week-module-1	2	505	February 19, 2023

Question about backprop

Related topics