Detach of the Loss class of pix2pixHD

I don’t know the details of this optional assignment, but the general point is that computing gradients is expensive. So they try to avoid computing gradients for portions of the compute graph that you don’t really need for whatever training you are doing at that point. The classic and simplest example is when you train a discriminator. In that case, you don’t need the gradients for the generator, so you detach the outputs of the generator. But when you train the generator, you cannot detach the discriminator: that is because the gradients of the generator depend on the gradients of the discriminator. But we need to be careful not to apply the discriminator gradients when we are training the generator. This is not a “correctness” issue, but just a performance issue. Here’s a thread which discusses this point in the context of the simple case I just described.

1 Like