Hey, can any master tell me the reason?
In TF if we compute the gradient and then for the next step with optimizer we will give gradient
( derivatives) to it, I think this is purely logically, at least it is so similar to the situation that we do without optimizer, here are the classical codes:
grads = tape.gradient(loss, model.trainable_weights)
optimizer.apply_gradients(zip(grads, model.trainable_weights))
However, with PyTorch it is with the classical codes:
yhat = model(x)
loss = loss_object(yhat, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
There isn’t thingy like grads
.
I know that both of them work respectively in their own world, however, how does PyTorch do behind?
Thank you