[C1_W2_Assignment] Discriminator training code: Why don't use "fake.requires_grad_ = False" instead of "fake.detach()"

TRAN_KHANH1 · November 30, 2022, 8:51am

In the code for training the descriminator, there’s this line of code:

disc_fake_pred = disc(fake.detach())

I found some internet sources saying that setting requires_grad_ = False will make the computation faster than using detach().

So I’ve tried to make it look like below:

    fake.requires_grad_ = False 
    disc_fake_pred = disc(fake)

However, I’m not sure if the two ways are equivalent or not (except for computation performance)? And if they’re the same, why not choosing requires_grad_ = True?

gautamaltman · November 30, 2022, 11:17pm

Hi Tran!

Did you see this post on the difference between detach() and requires_grad:

" The returned result will be the same, but the version with torch.no_grad will use less memory because it knows from the beginning that no gradients are needed so it doesn’t need to keep intermediary results."

I tried running the discriminator using the alternate requires_grad_ = False, and cannot discern meaningful difference between that and using detach(). Tran, did you observe the same thing?

I’m hearing from some of my colleagues that using requires_grad flag is the best practice.

Wendy · December 1, 2022, 5:32pm

@TRAN_KHANH1, to add to @gautamaltman’s comments:

I can think of one reason you might want to choose to use detach(). Since detach() creates a new tensor, if you use fake.detach() for the discriminator, you could theoretically still use the same fake with your generator where you want the gradients. I think some of the assignments take advantage of this.

I suspect the course developers used detach() in all assignments for consistency to help students focus on the main concepts, but you’re absolutely right that as long as you don’t need/want to reuse fake, the requires_grad_ approach is more efficient.

TRAN_KHANH1 · December 5, 2022, 4:48am

I neither observed the difference btw the two. However, the purpose of my question is to understand the reason why detach() has been chosen in this specific case instead of requires_grad. But as @Wendy pointed out, it’s most probably for consistency.

Topic		Replies	Views
Assertion Error in get_disc_loss Build Basic Generative Adversarial Networks week-1	6	608	December 29, 2022
Why should we detach the discriminators input ?! Build Basic Generative Adversarial Networks week-4	4	1570	November 30, 2022
C1W1 UNQ_C6 Assertion Error without any information Build Basic Generative Adversarial Networks week-1	4	596	November 11, 2022
Error Message Concerning Building the Training Code Build Basic Generative Adversarial Networks week-1	4	530	January 6, 2023
Detach() used in Assignment 4 Build Basic Generative Adversarial Networks week-4	5	598	November 13, 2022

[C1_W2_Assignment] Discriminator training code: Why don't use "fake.requires_grad_ = False" instead of "fake.detach()"

Related topics