Does scaling decrease computation time?

I understand that scaling makes the learning process more stable for Gradient Descent. Does having smaller values help with decreasing computation time?

No. The values are stored in floating point format, regardless of their magnitude. The value has little or no effect on how long the math operation takes.

Wouldn’t a flatter curve result in fewer iterations during minimizarion, and thus potentially reduced overall computation time and cost?

It depends on exactly what “decrease computation time” means.
And on what “a flatter curve” means.

Got it. You both have good points.

I guess @ai_curious means there are fewer iterations needed to reach convergence, and @TMosh meant for each iteration, the computation time is constant, since the data type determined the space needed for each data point.

Thanks a lot for taking the time to answer my question!