Saving computational graph during discriminator backpropagation

The gradients for the generator go through the discriminator by definition, so we need the discriminator graph when training the generator. But the situation is asymmetric: when we train the discriminator, we don’t need the generator’s gradients. Here’s a thread which discusses this in more detail. Here’s another thread that talks about the difference between “detach” and “retain_graph”.

2 Likes