Why don't we put a vector of initial learning rate in adam optimization instead of a single one

rishabhdeepsingh · February 9, 2024, 9:31am

Class: Additional Neural network concepts > Advanced Optimization

Hey everyone,
In the Adam optimization we use only 1 value for the initial learning rate, whereas as discussed in the video we have alpha[1-11] values. Shouldn’t we use 11 values for the optimization?

TMosh · February 9, 2024, 4:27pm

There is no method for adjusting multiple independent learning rates at the same time.

rishabhdeepsingh · February 9, 2024, 7:25pm

I mean, why didn’t they implement it like that?? To take an array instead of a single value?

TMosh · February 9, 2024, 7:49pm

Because it would be very complicated, and is not necessary.

If you normalize the features, then one learning rate applies equally to everything, and normalization also makes the minimization more efficient.

Deepti_Prasad · February 9, 2024, 7:59pm

Also understanding why a learning rate can’t be an array one needs to know learning rate is a float value or a constant float tensor, or a callable that takes no arguments and returns the actual value to use.

So using an array will cause an error.

rishabhdeepsingh · February 9, 2024, 8:59pm

But in the lecture, the teacher said that:

It uses a different learning rate for every single parameter of your model.

TMosh · February 9, 2024, 9:17pm

Please give the lecture title and time mark.

rishabhdeepsingh · February 9, 2024, 9:29pm

TMosh · February 9, 2024, 9:51pm

The lecture is slightly misleading.

Here is how TensorFlow implements Adam optimization (see below). Notice that the learning rate is a scalar value. This is the only learning rate that you have access to.

The Adam optimization allows each feature weight to decay at individual rates, based on the 1st and 2nd order moments. These are based on the beta_1 and beta_2 factors (also scalars). But you (as the designer) cannot control the learning rate for each feature.

I may have some of the details wrong, the implementation of an optimizer is complex and not my area of expertise.

rishabhdeepsingh · February 10, 2024, 9:35am

Thanks, this explains a lot!

Topic		Replies	Views
ADAM Advanced Optimization Algorithm Advanced Learning Algorithms week-module-2	2	561	January 30, 2023
Adam optimzation Advanced Learning Algorithms week-module-2	1	221	March 4, 2024
Adaptive Learning Rates AI Discussions	2	99	October 31, 2023
Do we still need normalization with "Adam" Advanced Learning Algorithms week-module-2	2	641	July 17, 2022
Adam Optimization Advanced Learning Algorithms week-module-2	2	509	August 9, 2022

Why don't we put a vector of initial learning rate in adam optimization instead of a single one

Related topics