Measuring loss

ai_is_cool · June 17, 2025, 6:39pm

Loss in Andrew’s course is computed by measuring the difference in vertical height between the ground truth target label and the algorithm’s prediction.

But wouldn’t it be faster and more accurate to measure’s perpendicular distance from the ground truth label to the algorithm’s prediction ?

TMosh · June 17, 2025, 6:43pm

Computing the perpendicular distance It is computationally more expensive, and the industry feels it does not provide enough benefit to be worth the extra CPU cycles.

ai_is_cool · June 17, 2025, 6:51pm

Couldn’t it be done with less computational expense using an iterative approach as in how it is done currently with linear regression?

TMosh · June 17, 2025, 7:05pm

How would it be simpler than subtracting y_hat from y?

I doubt there would be benefit in doing a regression {edit: was ‘recursion’} for every individual example in the data set for every training epoch.

ai_is_cool · June 17, 2025, 7:17pm

No, not simpler. But the ground truth label is likely to be closer to the initial prediction by measuring perpendicular distance than measuring vertical drop.

So perhaps fewer iterations or epochs as the prediction starts out closer to the label than the current linear regression estimate.

It would be interesting to perform an experiment and see how expensive loss computation is for computing perpendicular distance against vertical drop distance.

ai_is_cool · June 18, 2025, 11:26am

I don’t think I said it would be simpler.

Also, what do you mean by “…recursion…”?

TMosh · June 18, 2025, 5:05pm

Sorry, I meant to write “regression”. I have edited my previous reply.

ai_is_cool · June 21, 2025, 9:25pm

@conscell

Hi Pavel,

Can you come up with a simplified expression for:

f(x) = \frac{3x²}{(x² + 1)²}

conscell · June 21, 2025, 10:41pm

Hi Stephen,

Although one can add and subtract 3 in the numerator to split the expression, the resulting form is not simpler.

ai_is_cool · June 22, 2025, 7:51am

Hi @conscell What about this:

f(x) = \frac{3}{x² + 2 + x^{-2}}

What is the most computationally-efficient form of the initial expression to implement on a computer?

Or is there a simpler expression which approximates its evaluation against x.

conscell · June 22, 2025, 12:27pm

The last expression is undefined at 0. You can make an experiment comparing the execution time in the loop. I suspect that the time difference will be negligent.

ai_is_cool · June 22, 2025, 2:17pm

Yes, that is to be expected.

A computer program can be set to take care of the case where x = 0

On an Apple Silicon machine is a floating point division expensive compared to a floating point multiplication in terms of execution time?

ai_is_cool · June 22, 2025, 2:56pm

According to a test I have run, floating-point division is approximatetly 0.82% faster than multiplication on an Apple Silicon machine with Sequoia.

I used this code
main_2.py (617 Bytes)

ai_is_cool · June 22, 2025, 8:44pm

Actually I have discovered by experiment that:

f(x) = \frac{3}{x²}

is a very good approximation to the original expression.

The reason that I looking into this is to find out if the gradient descent algorithm can be made to work faster in a linear regression problem by using perpendicular distances from the ground truth labels to the model’s predictions.

rmwkwok · June 23, 2025, 1:39am

Mine for your reference. This should avoid heavy-lifting being optimized away because all results are different and kept. Another test could be comparing the run times of your model training before and after your approximation.

Your approximation should be good as long as x^2 \gg 1 because in that case, x^2 + 1 \approx x^2 and we get your form.

Then would the approximation be unfairly helping perpendicular distance to do better?

conscell · June 23, 2025, 3:11am

I have different results on Threadripper @ RHEL9:

Floating-point multiplication is faster by 3.67%.
[ps@localhost ~]$ python ex_op_speed.py 
Floating-point multiplication: 49.680842 seconds.
Floating-point division: 51.764723 seconds.

Please note, such test gives only rough estimates due to several reasons:

Modern CPUs dynamically adjust their frequency based on workload to save power and reduce heat. If your benchmark starts running while the CPU hasn’t yet ramped up to full speed the first iterations may execute at lower frequency. You may want to add a warm up step before measuring.
Most likely, Python is optimizing the expressions you compute by precomputing the result:

>>> import dis
>>> 
>>> def f_mul():
...     a = 1.2345 * 2.9876
... 
>>> def f_div():
...     a = 1 / 2.9876
... 
>>> dis.dis(f_mul)
  2           0 LOAD_CONST               1 (3.6881922)
              2 STORE_FAST               0 (a)
              4 LOAD_CONST               0 (None)
              6 RETURN_VALUE
>>> dis.dis(f_div)
  2           0 LOAD_CONST               1 (0.3347168295621904)
              2 STORE_FAST               0 (a)
              4 LOAD_CONST               0 (None)
              6 RETURN_VALUE
>>>

I suggest to make a different computation at each step, e.g. a *= 1.00000001.
3. Other processes run by OS might affect the results. Add more iterations and run the test multiple times.

ai_is_cool · June 23, 2025, 8:32am

Im having trouble understanding your English. Can you try again please?

ai_is_cool · June 23, 2025, 9:22am

Thanks.

I got my method from ChatGPT but I can see it is not as easy as I thought to get accurate benchmark execution times for this as I thought.

conscell · June 23, 2025, 9:29am

One more thing: Python adds overhead that can drown out the real cost difference between the operations. If you want to benchmark raw multiplication vs division, you can use C/C++ or, as Raymond @rmwkwok suggested, is to use NumPy (after all you will end up using it for your model implementation).

conscell · June 23, 2025, 9:40am

I have a question about Apple Silicon. I want to replace my 2015 MBA with something more powerful. How is the overall performance? Have you experienced CPU/GPU throttle under high loads such as models training?

Topic		Replies	Views
Ground truth label Advanced Learning Algorithms week-module-2	3	519	June 21, 2022
Optional Lab: Gradient Descent1 Supervised ML: Regression and Classification week-module-1	4	514	April 28, 2023
Minimizing the cost function question Supervised ML: Regression and Classification week-module-2	24	1033	July 15, 2022
Week3 lab 4 Supervised ML: Regression and Classification week-module-3	2	21	April 4, 2025
C1_W1_lab05: Linear regression code questions Supervised ML: Regression and Classification week-module-1	4	631	September 21, 2022

Measuring loss

Related topics