Detach() used in Assignment 4

paulinpaloalto · November 12, 2022, 4:08pm

The one other point worth making here is that this is just a performance issue, not a correctness issue. The training code is careful to always start each training step by zeroing the gradients, so that we don’t accidentally include previously computed gradients, and to only apply the gradients to the model that is actually being trained in that step. Of course we alternate between training the generator and training the discriminator. When we train the discriminator, we don’t need the gradients of the generator and computing them is expensive, so why not save the compute cycles. But also note that the situation is fundamentally asymmetric: when we train the generator, we need the gradients of the discriminator, because the cost by definition is computed using the output of the discriminator. So in that case, we can’t do the “detach” and depend on the logic I referred to earlier which is careful only to apply the gradients of the model we are actually training (the generator in that example).

Topic		Replies	Views
Why should we detach the discriminators input ?! Build Basic Generative Adversarial Networks week-module-4	4	1578	November 30, 2022
Why don't we detach the discriminator? Build Basic Generative Adversarial Networks week-module-2 , week-module-3	1	664	October 6, 2022
Wk1 Programming Assignment Build Basic Generative Adversarial Networks week-module-1	3	32	October 7, 2024
Alternately training the Generator and the Discriminator Build Basic Generative Adversarial Networks week-module-1	4	588	November 26, 2021
Reason for detach not being called in generator loss function Build Basic Generative Adversarial Networks week-module-1	3	573	June 12, 2023

Detach() used in Assignment 4

Related topics