Why should we detach the discriminators input ?!

aray · October 22, 2021, 9:43am

I would like to add more details to the answer above.

When we train the Discriminator, we don’t want to track operations for the Generator, since we are not going to update it or use its gradients. So we can speed up the training by detaching (from the computational graph) the output of the Generator.

When we train the Generator, we need to calculate the gradients of the Discriminator, but we won’t update it. Note that the update is done by calling optimizer.step() and each model has its own optimizer, whereas backward just calculates gradients without updating.

Here is an illustration of the generator training step. (Note that we pass true label to the discriminator to calculate gradients towards real data)

You can read this thread to gain more intuition behind this.

Topic		Replies	Views
Why don't we detach the discriminator? Build Basic Generative Adversarial Networks week-2 , week-3	1	664	October 6, 2022
Detach() used in Assignment 4 Build Basic Generative Adversarial Networks week-4	5	598	November 13, 2022
Stop Discriminator weights update while update Generator Build Basic Generative Adversarial Networks week-1	1	646	August 13, 2022
Saving computational graph during discriminator backpropagation Build Basic Generative Adversarial Networks week-1	1	30	July 8, 2024
Wk1 Programming Assignment Build Basic Generative Adversarial Networks week-1	3	30	October 7, 2024

Why should we detach the discriminators input ?!

Related topics