Detach() used in Assignment 4

YIHUI · November 12, 2022, 2:57am

In the last part of Assignment 4, I did not get the purpose of adding ‘.detach()’ for ‘original_classifications’. Can anyone explain?

gent.spah · November 12, 2022, 10:49am

The purpose of the detach as far as I remember is to freeze training the generator, when the discriminator is training.

Juan_Olano · November 12, 2022, 3:09pm

Hi @YIHUI ,

As you know, the generator is needed when calculating the discriminator’s loss. For that, we get an image from the generator, and we feed it to the discriminator. In other words, we have a result that comes from the Generator, that will be used by the Discriminator.

If we don’t detach this result from the Generator, then the Generator will be affected as well because the original returned tensor shares the graph of the Generator. The detach “breaks” the gradient connection with the Generator. And that is the reason why we use .detach().

Please let me know if this clarifies your question. If you still have any question about it, just let us know.

Thanks!

Juan

paulinpaloalto · November 12, 2022, 4:08pm

The one other point worth making here is that this is just a performance issue, not a correctness issue. The training code is careful to always start each training step by zeroing the gradients, so that we don’t accidentally include previously computed gradients, and to only apply the gradients to the model that is actually being trained in that step. Of course we alternate between training the generator and training the discriminator. When we train the discriminator, we don’t need the gradients of the generator and computing them is expensive, so why not save the compute cycles. But also note that the situation is fundamentally asymmetric: when we train the generator, we need the gradients of the discriminator, because the cost by definition is computed using the output of the discriminator. So in that case, we can’t do the “detach” and depend on the logic I referred to earlier which is careful only to apply the gradients of the model we are actually training (the generator in that example).

YIHUI · November 13, 2022, 2:45am

I think it should be reversed here as my question is not related to a typical training processes of GAN

YIHUI · November 13, 2022, 2:50am

Thanks, I feel your answer fits more for the detach() used in training a traditional GAN. I think the author wants to freeze the weights of classifier as the purpose is to change the input of noise so that the final images can be trained towards ‘smiling’.

Topic		Replies	Views
Why should we detach the discriminators input ?! Build Basic Generative Adversarial Networks week-module-4	4	1578	November 30, 2022
Why don't we detach the discriminator? Build Basic Generative Adversarial Networks week-module-2 , week-module-3	1	664	October 6, 2022
Wk1 Programming Assignment Build Basic Generative Adversarial Networks week-module-1	3	32	October 7, 2024
Alternately training the Generator and the Discriminator Build Basic Generative Adversarial Networks week-module-1	4	588	November 26, 2021
Reason for detach not being called in generator loss function Build Basic Generative Adversarial Networks week-module-1	3	573	June 12, 2023

Detach() used in Assignment 4

Related topics