[C1_W2_Assignment] Discriminator training code: Why don't use "fake.requires_grad_ = False" instead of "fake.detach()"

In the code for training the descriminator, there’s this line of code:

disc_fake_pred = disc(fake.detach())

I found some internet sources saying that setting requires_grad_ = False will make the computation faster than using detach().

So I’ve tried to make it look like below:

    fake.requires_grad_ = False 
    disc_fake_pred = disc(fake)

However, I’m not sure if the two ways are equivalent or not (except for computation performance)? And if they’re the same, why not choosing requires_grad_ = True?

Hi Tran!

Did you see this post on the difference between detach() and requires_grad:

" The returned result will be the same, but the version with torch.no_grad will use less memory because it knows from the beginning that no gradients are needed so it doesn’t need to keep intermediary results."

I tried running the discriminator using the alternate requires_grad_ = False, and cannot discern meaningful difference between that and using detach(). Tran, did you observe the same thing?

I’m hearing from some of my colleagues that using requires_grad flag is the best practice.

@TRAN_KHANH1, to add to @gautamaltman’s comments:

I can think of one reason you might want to choose to use detach(). Since detach() creates a new tensor, if you use fake.detach() for the discriminator, you could theoretically still use the same fake with your generator where you want the gradients. I think some of the assignments take advantage of this.

I suspect the course developers used detach() in all assignments for consistency to help students focus on the main concepts, but you’re absolutely right that as long as you don’t need/want to reuse fake, the requires_grad_ approach is more efficient.

I neither observed the difference btw the two. However, the purpose of my question is to understand the reason why detach() has been chosen in this specific case instead of requires_grad. But as @Wendy pointed out, it’s most probably for consistency.