Wk1 Programming Assignment

pxgao · October 7, 2024, 12:59am

In the week 1 programming assignment, I only called detach() in get_disc_loss(), which is consistent with the hints in the lab.

My question is do we need to freeze the discriminator’s weights as well in get_gen_loss(). In the course video, it is clearly mentioned that when you train the generator, backpropagation should only update generator weights, and discriminator weights should be freezed.

I didn’t freeze the weights and I still get 100% score, is this expected?

If I really want to freeze the weight in this case, how should i do that? My understanding is detach() on the discriminator won’t work because it “detaches” the compute graph, hence it’s parent nodes including the generator will also be detached.

paulinpaloalto · October 7, 2024, 3:42am

It’s not that you freeze the weights of the “other” model: it’s that you don’t update them. You may generate gradients for the other model, but you simply don’t apply them.

The situation is fundamentally asymmetric: when you train the discriminator, you literally don’t depend on the gradients for the generator, so you can detach the generator so that the gradients don’t get created. It’s not that creating them is incorrect, because we would discard them anyway. The point is that it’s just wasted compute, so you can save the effort.

In the case that you are training the generator, you require the gradients for the discriminator, since the generator’s gradients include the discriminator gradients as part of the Chain Rule calculations.

Here’s another thread that expresses the same ideas above, but with a bit different wording, so maybe that’s also worth a look.

pxgao · October 7, 2024, 4:23am

Thanks Paul. I see. So the optimizer has specified only the gen.parameters() are needed to optimize, so when we call step(), it will only update gen.parameters().

 gen_opt = torch.optim.Adam(gen.parameters(), lr=lr)

paulinpaloalto · October 7, 2024, 4:29am

Yes, that defines the optimization function, but it’s a little more complicated than that. Now look at the detailed logic, including the calls to the zero_grad() method and the step() method.

Topic		Replies	Views
Alternately training the Generator and the Discriminator Build Basic Generative Adversarial Networks week-1	4	586	November 26, 2021
Stop Discriminator weights update while update Generator Build Basic Generative Adversarial Networks week-1	1	645	August 13, 2022
Detach() used in Assignment 4 Build Basic Generative Adversarial Networks week-4	5	598	November 13, 2022
Why should we detach the discriminators input ?! Build Basic Generative Adversarial Networks week-4	4	1534	November 30, 2022
Reason for detach not being called in generator loss function Build Basic Generative Adversarial Networks week-1	3	571	June 12, 2023

Wk1 Programming Assignment

Related topics