I understand that scaling makes the learning process more stable for Gradient Descent. Does having smaller values help with decreasing computation time?

No. The values are stored in floating point format, regardless of their magnitude. The value has little or no effect on how long the math operation takes.

1 Like

Wouldnâ€™t a flatter curve result in fewer iterations during minimizarion, and thus potentially reduced overall computation time and cost?

1 Like

It depends on exactly what â€śdecrease computation timeâ€ť means.

And on what â€śa flatter curveâ€ť means.

Got it. You both have good points.

I guess @ai_curious means there are fewer iterations needed to reach convergence, and @TMosh meant for each iteration, the computation time is constant, since the data type determined the space needed for each data point.

Thanks a lot for taking the time to answer my question!