Exponential learning rate decay

One of the formulas suggested in the videos for the learning rate decay is
alpha_t = 0.95^t alpha_0 .
In this case the sum_{t=0}^infinity alpha_t is finite. Isn’t it a problem? Doesn’t it prevent the gradient flow from reaching its destination?

Hi, @psv.

It could, depending on the decay rate and the number of epochs you train your model (which is obviously finite).

If it does, just tweak your hyperparameters :slight_smile:

1 Like

@psv
Ditto Ramon’s response.
In Adam optimization, the main hyperparameter you would be tuning is the alpha, and it changes from one set of data to the another. As far as I know, there is no one-fits-all type of solution :stuck_out_tongue: and they can vary in order of magnitude!

1 Like

Couldn’t agree more. Thanks, @suki :slight_smile: