Measuring loss

Loss in Andrew’s course is computed by measuring the difference in vertical height between the ground truth target label and the algorithm’s prediction.

But wouldn’t it be faster and more accurate to measure’s perpendicular distance from the ground truth label to the algorithm’s prediction ?

Computing the perpendicular distance It is computationally more expensive, and the industry feels it does not provide enough benefit to be worth the extra CPU cycles.

Couldn’t it be done with less computational expense using an iterative approach as in how it is done currently with linear regression?

How would it be simpler than subtracting y_hat from y?

I doubt there would be benefit in doing a regression {edit: was ‘recursion’} for every individual example in the data set for every training epoch.

No, not simpler. But the ground truth label is likely to be closer to the initial prediction by measuring perpendicular distance than measuring vertical drop.

So perhaps fewer iterations or epochs as the prediction starts out closer to the label than the current linear regression estimate.

It would be interesting to perform an experiment and see how expensive loss computation is for computing perpendicular distance against vertical drop distance.

1 Like

I don’t think I said it would be simpler.

Also, what do you mean by “…recursion…”?

Sorry, I meant to write “regression”. I have edited my previous reply.

@conscell

Hi Pavel,

Can you come up with a simplified expression for:

f(x) = \frac{3x²}{(x² + 1)²}

Hi Stephen,

Although one can add and subtract 3 in the numerator to split the expression, the resulting form is not simpler.

Hi @conscell What about this:

f(x) = \frac{3}{x² + 2 + x^{-2}}

What is the most computationally-efficient form of the initial expression to implement on a computer?

Or is there a simpler expression which approximates its evaluation against x.

The last expression is undefined at 0. You can make an experiment comparing the execution time in the loop. I suspect that the time difference will be negligent.

Yes, that is to be expected.

A computer program can be set to take care of the case where x = 0

On an Apple Silicon machine is a floating point division expensive compared to a floating point multiplication in terms of execution time?

According to a test I have run, floating-point division is approximatetly 0.82% faster than multiplication on an Apple Silicon machine with Sequoia.

I used this code
main_2.py (617 Bytes)

Actually I have discovered by experiment that:

f(x) = \frac{3}{x²}

is a very good approximation to the original expression.

The reason that I looking into this is to find out if the gradient descent algorithm can be made to work faster in a linear regression problem by using perpendicular distances from the ground truth labels to the model’s predictions.

Mine for your reference. This should avoid heavy-lifting being optimized away because all results are different and kept. Another test could be comparing the run times of your model training before and after your approximation.

Your approximation should be good as long as x^2 \gg 1 because in that case, x^2 + 1 \approx x^2 and we get your form.

Then would the approximation be unfairly helping perpendicular distance to do better?

1 Like

I have different results on Threadripper @ RHEL9:

Floating-point multiplication is faster by 3.67%.
[ps@localhost ~]$ python ex_op_speed.py 
Floating-point multiplication: 49.680842 seconds.
Floating-point division: 51.764723 seconds.

Please note, such test gives only rough estimates due to several reasons:

  1. Modern CPUs dynamically adjust their frequency based on workload to save power and reduce heat. If your benchmark starts running while the CPU hasn’t yet ramped up to full speed the first iterations may execute at lower frequency. You may want to add a warm up step before measuring.
  2. Most likely, Python is optimizing the expressions you compute by precomputing the result:
>>> import dis
>>> 
>>> def f_mul():
...     a = 1.2345 * 2.9876
... 
>>> def f_div():
...     a = 1 / 2.9876
... 
>>> dis.dis(f_mul)
  2           0 LOAD_CONST               1 (3.6881922)
              2 STORE_FAST               0 (a)
              4 LOAD_CONST               0 (None)
              6 RETURN_VALUE
>>> dis.dis(f_div)
  2           0 LOAD_CONST               1 (0.3347168295621904)
              2 STORE_FAST               0 (a)
              4 LOAD_CONST               0 (None)
              6 RETURN_VALUE
>>> 

I suggest to make a different computation at each step, e.g. a *= 1.00000001.
3. Other processes run by OS might affect the results. Add more iterations and run the test multiple times.

Im having trouble understanding your English. Can you try again please?

Thanks.

I got my method from ChatGPT but I can see it is not as easy as I thought to get accurate benchmark execution times for this as I thought.

One more thing: Python adds overhead that can drown out the real cost difference between the operations. If you want to benchmark raw multiplication vs division, you can use C/C++ or, as Raymond @rmwkwok suggested, is to use NumPy (after all you will end up using it for your model implementation).

1 Like

I have a question about Apple Silicon. I want to replace my 2015 MBA with something more powerful. How is the overall performance? Have you experienced CPU/GPU throttle under high loads such as models training?