Since the generator is needed when calculating the discriminator’s loss, you will need to call .detach() on the generator result to ensure that only the discriminator is updated.
Sometimes we call .detach() on the wrong result.
Please evaluate this aspect of your implementation and see if you can modify it to fit the requirement.
I feel I have to understand this at bit better.
This is what the detach() method do:
detach():
Sometimes, you want to calculate and use a tensor’s value without calculating its gradients. For example, if you have two models, A and B, and you want to directly optimize the parameters of A with respect to the output of B, without calculating the gradients through B, then you could feed the detached output of B to A. There are many reasons you might want to do this, including efficiency or cyclical dependencies (i.e. A depends on B depends on A).
So the reason for calling detach() here is that the Generator and Discriminator share the Tensor? And if you don’t do it you will calculate the loss on all of the Tensor’s content? But you only want the Discriminator’s part?
Yes, the reason for calling detach() here is because the result comes from the Generator and will be used by the Discriminator. If we don’t detach it then the Generator will be affected as well - the original returned tensor shares the graph of the Generator. The detach “breaks” the gradient connection with the Generator.