-
When we are training the Generator, why don’t we use disc_opt.zero_grad() i.e. why don’t we set the gradients of discriminator to be zero? The generator learns by backpropagating the error through the discriminator to itself to update its parameters. Now when we backpropagate through the discriminator we use its gradient values. But during training of Discriminator it already has some values. So the old values will get accumulated with new values. But still we don’ set the grads of discriminator to be zero. Why is that? Please help
-
Why do we use retain_graph = True here when training Discriminator?
Hi!
During the generator training phase, we indeed set disc_opt.zero_grad() at the start of epoch before updating the gradients of the discriminator, check the instructions in the optional part.
The retain_graph=True
option is used because there are multiple backward passes happening in the same iteration of the training loop. In this loop, you first calculate the discriminator loss and perform a backward pass to compute gradients for the discriminator’s parameters. Then, you calculate the generator loss and perform another backward pass to compute gradients for the generator’s parameters. These two backward passes share some of the same computation graph nodes (for example gen_loss includes discriminator score of fake images generated by generator ), and using retain_graph=True
ensures that the graph is not discarded after the first backward pass.
If you don’t use retain_graph=True
in this scenario, you will likely encounter a “RuntimeError: Trying to backwards through the graph a second-time” error I think. This error occurs because, by default, PyTorch clears the computation graph after a backward pass, assuming that you won’t need it again. However, in this case, you do need to reuse parts of the computation graph for the generator’s backward pass, so retain_graph=True
is necessary to prevent the graph from being cleared prematurely.
Hope you get the point. If not feel free to post your queries.
Regards,
Nithin
Thanks. The 2nd question is clear.
But about the first question what I want to say is when we train the discriminator it got some gradient values. And when we now train the generator the old gradient values of the discriminator is again used because we are not setting them to zero. The gradient values get accumulated for the generator training in the same epoch.
Ok, now I get what you are trying to ask.
Remember loss.backward() just calculates the gradients, it doesn’t update the model’s parameters. opt.step() updates the model’s parameters based on the gradients that were calculated using loss.backward(). And at the start of every epoch we set the gradients to zero. So if you set the gradients to zero again before the start of the generator part, it effectively does nothing unless you do opt.step() one more time. But we won’t do that as the updated gen is used by disc_loss in the next epoch anyways.
Ok now I get it. Thanks for the help.