Use of apply method of optimizer

dsfasfuqwjoasjsad · March 17, 2025, 5:37pm

Hello mentor, the following code for updating gradient is from week2 's lab.

def apply_gradient(optimizer, model, x, y):
  with tf.GradientTape() as tape:
    logits = model(x)
    loss_value = loss_object(y_true=y, y_pred=logits)
  
  gradients = tape.gradient(loss_value, model.trainable_weights)
  optimizer.apply_gradients(zip(gradients, model.trainable_weights))
  
  return logits, loss_value

For this line of code
optimizer.apply_gradients(zip(gradients, model.trainable_weights))

I tried to use the following code instead, and I think it should have the same function.
optimizer.apply(gradients)

However, I got the error message while run the training

Would you clarify why the use of apply() method lead to an error ? I noticed from the source code that apply_gradients() is calling apply(), so I cannot understand where the error is.

Also, the first line of definition of apply_gradients() in the source code is

grads, trainable_variables = zip(*grads_and_vars)

I understand what zip() does in python, but don’t understand what zip(*...)does. Would you make some clarification ? Thanks.

Wendy · March 17, 2025, 10:44pm

My guess as to why you got an error calling the apply() method is that the ‘Adam’ optimizer probably doesn’t expose that method, even though you are seeing it in the base class source code you’re looking at. This keras documentation for optimizers only mentions the apply_gradients() method, so it’s probably not required for all optimizers to expose the apply() method: Optimizers

As far as the zip(*grads_and_vars) you see in the BaseOptimizer() source code, the * operator in front of an iteratable in a function call like this unpacks the iterables elements. So, in this case, it basically takes the gradients and model.trainable_weights that were zipped together to be passed into apply_gradients and breaks them back apart into gradients and trainable_weights, so those two can be passed to apply(). This is what is happening in the BaseOptimizer()'s implementation of apply_gradients() that you shared. The Adam optimizer’s implementation of apply_gradients() could be different.

dsfasfuqwjoasjsad · March 18, 2025, 12:44am

Hello Wendy, thanks for your reply. I have understand the use of zip(*...), while I am still a bit confused the apply() method of optimizer.

After looking at adam optimizer’s source code, it inherits the base optimizer and does not overwrite the base optimizer’s apply() method. So the adam should has the apply() method. Would you provide some clarifications ?

Deepti_Prasad · March 18, 2025, 7:45am

yes Adam has the apply() method but in the way you used it is causing infinite recursion of setattributes. Your error log mentions this info.

dsfasfuqwjoasjsad · March 18, 2025, 5:32pm

Hello mentor, could you provide the right way of using the apply() method ? I am still confused with how to use it. Thanks.

Deepti_Prasad · March 18, 2025, 8:08pm

when I said Adam does use apply method, I was concurring with Wendy’s statement of using the gradient function to the zipped set of gradients and variables.

remember the issue of using apply method to only gradients would cause gradient exploding as it will will keep recalling its functions within the setattributes that’s why the keras documentation of Adam uses apply_gradient method to zipped set of gradients and weight, and provide the values separately for gradients and weight.

You basically when shared the base optimizer stating it doesn’t overwrite the base optimiser but the base optimizer contains set of attribute and using only apply method only to gradients will explode the gradient as each attribute having the function will recall itself each time with gradient, causing your training to explode or give NaN value.

as you know the apply() function returns a new DataFrame object after applying the function to its elements and in this case, the optimizer having set attributes will cause infinite functions to recall itself leading to exploding gradient or a NaN value.

dsfasfuqwjoasjsad · March 18, 2025, 8:39pm

Thank you so much for the clarification !

Topic		Replies	Views
Question about apply_gradients Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	523	May 31, 2023
C3W3 Exercise10: apply_gradients() missing 1 required positional argument: 'grads_and_vars' Advanced Computer Vision with TensorFlow week-module-2	7	673	August 24, 2023
C3W2 Exercise 10 stuck Advanced Computer Vision with TensorFlow week-module-2	3	649	August 18, 2021
C3W3 Lab RL why setting an adam optimizer while only updating gradients? Unsupervised Learning, Recommenders, Reinforcement week-module-3	10	245	April 1, 2024
C4 W4 A2: Art_Generation_with_Neural_Style_Transfer Apply Gradients Convolutional Neural Networks coursera-platform	1	559	June 28, 2022

Use of apply method of optimizer

Related topics