Use of apply method of optimizer

Hello mentor, the following code for updating gradient is from week2 's lab.

def apply_gradient(optimizer, model, x, y):
  with tf.GradientTape() as tape:
    logits = model(x)
    loss_value = loss_object(y_true=y, y_pred=logits)
  
  gradients = tape.gradient(loss_value, model.trainable_weights)
  optimizer.apply_gradients(zip(gradients, model.trainable_weights))
  
  return logits, loss_value

For this line of code
optimizer.apply_gradients(zip(gradients, model.trainable_weights))

I tried to use the following code instead, and I think it should have the same function.
optimizer.apply(gradients)

However, I got the error message while run the training

Would you clarify why the use of apply() method lead to an error ? I noticed from the source code that apply_gradients() is calling apply(), so I cannot understand where the error is.

Also, the first line of definition of apply_gradients() in the source code is

grads, trainable_variables = zip(*grads_and_vars)

I understand what zip() does in python, but don’t understand what zip(*...)does. Would you make some clarification ? Thanks.

My guess as to why you got an error calling the apply() method is that the ‘Adam’ optimizer probably doesn’t expose that method, even though you are seeing it in the base class source code you’re looking at. This keras documentation for optimizers only mentions the apply_gradients() method, so it’s probably not required for all optimizers to expose the apply() method: Optimizers

As far as the zip(*grads_and_vars) you see in the BaseOptimizer() source code, the * operator in front of an iteratable in a function call like this unpacks the iterables elements. So, in this case, it basically takes the gradients and model.trainable_weights that were zipped together to be passed into apply_gradients and breaks them back apart into gradients and trainable_weights, so those two can be passed to apply(). This is what is happening in the BaseOptimizer()'s implementation of apply_gradients() that you shared. The Adam optimizer’s implementation of apply_gradients() could be different.

Hello Wendy, thanks for your reply. I have understand the use of zip(*...), while I am still a bit confused the apply() method of optimizer.

After looking at adam optimizer’s source code, it inherits the base optimizer and does not overwrite the base optimizer’s apply() method. So the adam should has the apply() method. Would you provide some clarifications ?

yes Adam has the apply() method but in the way you used it is causing infinite recursion of setattributes. Your error log mentions this info.

Hello mentor, could you provide the right way of using the apply() method ? I am still confused with how to use it. Thanks.

when I said Adam does use apply method, I was concurring with Wendy’s statement of using the gradient function to the zipped set of gradients and variables.

remember the issue of using apply method to only gradients would cause gradient exploding as it will will keep recalling its functions within the setattributes that’s why the keras documentation of Adam uses apply_gradient method to zipped set of gradients and weight, and provide the values separately for gradients and weight.

You basically when shared the base optimizer stating it doesn’t overwrite the base optimiser but the base optimizer contains set of attribute and using only apply method only to gradients will explode the gradient as each attribute having the function will recall itself each time with gradient, causing your training to explode or give NaN value.

as you know the apply() function returns a new DataFrame object after applying the function to its elements and in this case, the optimizer having set attributes will cause infinite functions to recall itself leading to exploding gradient or a NaN value.

1 Like

Thank you so much for the clarification !