Hey, can any master tell me the reason?

In TF if we compute the gradient and then for the next step with optimizer we will give `gradient`

( *derivatives*) to it, I think this is purely logically, at least it is so similar to the situation that we do without optimizer, here are the classical codes:

```
grads = tape.gradient(loss, model.trainable_weights)
optimizer.apply_gradients(zip(grads, model.trainable_weights))
```

However, with PyTorch it is with the classical codes:

```
yhat = model(x)
loss = loss_object(yhat, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
```

There isnâ€™t thingy like `grads`

.

I know that both of them work respectively in their own world, however, how does PyTorch do behind?

Thank you