Course 1 Week 1 Programming Assignment

Writobrata · September 25, 2023, 12:09pm

When we are training the Generator, why don’t we use disc_opt.zero_grad() i.e. why don’t we set the gradients of discriminator to be zero? The generator learns by backpropagating the error through the discriminator to itself to update its parameters. Now when we backpropagate through the discriminator we use its gradient values. But during training of Discriminator it already has some values. So the old values will get accumulated with new values. But still we don’ set the grads of discriminator to be zero. Why is that? Please help
Why do we use retain_graph = True here when training Discriminator?

Nithin_Skantha_M · September 25, 2023, 3:51pm

Hi!

During the generator training phase, we indeed set disc_opt.zero_grad() at the start of epoch before updating the gradients of the discriminator, check the instructions in the optional part.

The retain_graph=True option is used because there are multiple backward passes happening in the same iteration of the training loop. In this loop, you first calculate the discriminator loss and perform a backward pass to compute gradients for the discriminator’s parameters. Then, you calculate the generator loss and perform another backward pass to compute gradients for the generator’s parameters. These two backward passes share some of the same computation graph nodes (for example gen_loss includes discriminator score of fake images generated by generator ), and using retain_graph=True ensures that the graph is not discarded after the first backward pass.

If you don’t use retain_graph=True in this scenario, you will likely encounter a “RuntimeError: Trying to backwards through the graph a second-time” error I think. This error occurs because, by default, PyTorch clears the computation graph after a backward pass, assuming that you won’t need it again. However, in this case, you do need to reuse parts of the computation graph for the generator’s backward pass, so retain_graph=True is necessary to prevent the graph from being cleared prematurely.

Hope you get the point. If not feel free to post your queries.

Regards,
Nithin

Writobrata · September 25, 2023, 3:56pm

Thanks. The 2nd question is clear.
But about the first question what I want to say is when we train the discriminator it got some gradient values. And when we now train the generator the old gradient values of the discriminator is again used because we are not setting them to zero. The gradient values get accumulated for the generator training in the same epoch.

Nithin_Skantha_M · September 25, 2023, 4:15pm

Ok, now I get what you are trying to ask.

Remember loss.backward() just calculates the gradients, it doesn’t update the model’s parameters. opt.step() updates the model’s parameters based on the gradients that were calculated using loss.backward(). And at the start of every epoch we set the gradients to zero. So if you set the gradients to zero again before the start of the generator part, it effectively does nothing unless you do opt.step() one more time. But we won’t do that as the updated gen is used by disc_loss in the next epoch anyways.

Writobrata · September 25, 2023, 4:19pm

Ok now I get it. Thanks for the help.

Topic		Replies	Views
retain_graph=True and disc_opt.zero_grad() Build Basic Generative Adversarial Networks week-2 , week-3	6	796	December 17, 2022
retain_graph=True? Build Basic Generative Adversarial Networks week-1	3	574	June 1, 2022
C1W1 Training Loop Build Basic Generative Adversarial Networks week-1	2	508	May 22, 2023
Your First GAN assignment: use of retain_graph=True Build Basic Generative Adversarial Networks week-1	2	764	April 14, 2022
Why do we set the retain_graph = True in the discriminators loss? Apply Generative Adversarial Networks week-3	1	588	November 5, 2021

Course 1 Week 1 Programming Assignment

Related topics