The gradients for the generator go through the discriminator by definition, so we need the discriminator graph when training the generator. But the situation is asymmetric: when we train the discriminator, we don’t need the generator’s gradients. Here’s a thread which discusses this in more detail. Here’s another thread that talks about the difference between “detach” and “retain_graph”.
2 Likes