In the last part of Assignment 4, I did not get the purpose of adding ‘.detach()’ for ‘original_classifications’. Can anyone explain?
The purpose of the detach as far as I remember is to freeze training the generator, when the discriminator is training.
Hi @YIHUI ,
As you know, the generator is needed when calculating the discriminator’s loss. For that, we get an image from the generator, and we feed it to the discriminator. In other words, we have a result that comes from the Generator, that will be used by the Discriminator.
If we don’t detach this result from the Generator, then the Generator will be affected as well because the original returned tensor shares the graph of the Generator. The detach “breaks” the gradient connection with the Generator. And that is the reason why we use .detach().
Please let me know if this clarifies your question. If you still have any question about it, just let us know.
Thanks!
Juan
The one other point worth making here is that this is just a performance issue, not a correctness issue. The training code is careful to always start each training step by zeroing the gradients, so that we don’t accidentally include previously computed gradients, and to only apply the gradients to the model that is actually being trained in that step. Of course we alternate between training the generator and training the discriminator. When we train the discriminator, we don’t need the gradients of the generator and computing them is expensive, so why not save the compute cycles. But also note that the situation is fundamentally asymmetric: when we train the generator, we need the gradients of the discriminator, because the cost by definition is computed using the output of the discriminator. So in that case, we can’t do the “detach” and depend on the logic I referred to earlier which is careful only to apply the gradients of the model we are actually training (the generator in that example).
I think it should be reversed here as my question is not related to a typical training processes of GAN
Thanks, I feel your answer fits more for the detach() used in training a traditional GAN. I think the author wants to freeze the weights of classifier as the purpose is to change the input of noise so that the final images can be trained towards ‘smiling’.