Did you see this post on the difference between detach() and requires_grad:
" The returned result will be the same, but the version with torch.no_grad will use less memory because it knows from the beginning that no gradients are needed so it doesn’t need to keep intermediary results."
I tried running the discriminator using the alternate requires_grad_ = False, and cannot discern meaningful difference between that and using detach(). Tran, did you observe the same thing?
I’m hearing from some of my colleagues that using requires_grad flag is the best practice.
I can think of one reason you might want to choose to use detach(). Since detach() creates a new tensor, if you use fake.detach() for the discriminator, you could theoretically still use the same fake with your generator where you want the gradients. I think some of the assignments take advantage of this.
I suspect the course developers used detach() in all assignments for consistency to help students focus on the main concepts, but you’re absolutely right that as long as you don’t need/want to reuse fake, the requires_grad_ approach is more efficient.
I neither observed the difference btw the two. However, the purpose of my question is to understand the reason why detach() has been chosen in this specific case instead of requires_grad. But as @Wendy pointed out, it’s most probably for consistency.