Why should we detach the discriminators input ?!

paulinpaloalto · October 22, 2021, 5:53am

The situation is asymmetric: when we train the discriminator, we can get the gradients of the loss function without going through the generator. But when we compute the gradients of the generator to train the generator, they (by definition) go through the discriminator since the loss is computed from the output of the discriminator, right? So when we train the discriminator, we detach the generator, but not the other way around. Also note that this is not a “correctness” issue, because we are careful not to actually apply any gradients that we compute that aren’t relevant to the actual training we’re doing on any particular cycle and then we zero them before the next training cycle. It is only a performance issue: it is a non-trivial compute cost to compute the gradients, so it makes sense only to do that when you really need the values.

Topic		Replies	Views
Why don't we detach the discriminator? Build Basic Generative Adversarial Networks week-2 , week-3	1	664	October 6, 2022
Detach() used in Assignment 4 Build Basic Generative Adversarial Networks week-4	5	598	November 13, 2022
Stop Discriminator weights update while update Generator Build Basic Generative Adversarial Networks week-1	1	646	August 13, 2022
Saving computational graph during discriminator backpropagation Build Basic Generative Adversarial Networks week-1	1	30	July 8, 2024
Wk1 Programming Assignment Build Basic Generative Adversarial Networks week-1	3	30	October 7, 2024

Why should we detach the discriminators input ?!

Related topics