Gradient Descent in Pytorch vs. TF

Chris.X · April 20, 2022, 5:05pm

Hey, can any master tell me the reason?

In TF if we compute the gradient and then for the next step with optimizer we will give gradient ( derivatives) to it, I think this is purely logically, at least it is so similar to the situation that we do without optimizer, here are the classical codes:

  grads = tape.gradient(loss, model.trainable_weights)
  optimizer.apply_gradients(zip(grads, model.trainable_weights))

However, with PyTorch it is with the classical codes:

  yhat = model(x)
  loss = loss_object(yhat, y)
  optimizer.zero_grad()
  loss.backward()
  optimizer.step()

There isn’t thingy like grads.

I know that both of them work respectively in their own world, however, how does PyTorch do behind?

Thank you

SainiAnkit · April 20, 2022, 5:27pm

Pytorch does the same thing while performing gradient descent.

When loss.backward() is called, all the trainable weights of the network get differentiated with respect to the loss and all the trainable weights will have their .grad variable accumulated with the gradients.

optimizer.step() updates the trainable weights of the network.

Chris.X · April 20, 2022, 8:47pm

thanks, after I saw those codes in some projects then I understand why , thx

        
        # backward pass: compute gradient of the loss with respect to all the learnable parameters
        loss.backward()
        
        # update parameters slope and bias
        w.data = w.data - lr * w.grad.data
        b.data = b.data - lr * b.grad.data
        
        # zero the gradients before running the backward pass
        w.grad.data.zero_()
        b.grad.data.zero_()

Topic		Replies	Views
C1W3 WGAN - taking gradients? Build Basic Generative Adversarial Networks week-3	2	364	February 9, 2022
Custom Back Propagation in Deep Learning Deep Learning Resources ai-discussions	6	42	October 2, 2024
What's the difference between "optimizer.minimize" and "tape.gradient and optimizer.apply_gradients(" Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	515	September 19, 2022
Few questions regarding first GAN assignment Build Basic Generative Adversarial Networks week-1	2	716	January 14, 2023
Week 4 \| Building your Deep Neural Network Query Neural Networks and Deep Learning coursera-platform	3	536	February 15, 2022

Gradient Descent in Pytorch vs. TF

Related topics