Wk1 Programming Assignment

paulinpaloalto · October 7, 2024, 3:42am

It’s not that you freeze the weights of the “other” model: it’s that you don’t update them. You may generate gradients for the other model, but you simply don’t apply them.

The situation is fundamentally asymmetric: when you train the discriminator, you literally don’t depend on the gradients for the generator, so you can detach the generator so that the gradients don’t get created. It’s not that creating them is incorrect, because we would discard them anyway. The point is that it’s just wasted compute, so you can save the effort.

In the case that you are training the generator, you require the gradients for the discriminator, since the generator’s gradients include the discriminator gradients as part of the Chain Rule calculations.

Here’s another thread that expresses the same ideas above, but with a bit different wording, so maybe that’s also worth a look.

Topic		Replies	Views
Alternately training the Generator and the Discriminator Build Basic Generative Adversarial Networks week-1	4	586	November 26, 2021
Stop Discriminator weights update while update Generator Build Basic Generative Adversarial Networks week-1	1	645	August 13, 2022
Detach() used in Assignment 4 Build Basic Generative Adversarial Networks week-4	5	598	November 13, 2022
Why should we detach the discriminators input ?! Build Basic Generative Adversarial Networks week-4	4	1544	November 30, 2022
Reason for detach not being called in generator loss function Build Basic Generative Adversarial Networks week-1	3	571	June 12, 2023

Wk1 Programming Assignment

Related topics