When training generator, .detach() is not called on discriminator. when we train the generator, we need the gradients of the discriminator, because the cost by definition is computed using the output of the discriminator (Detach() used in Assignment 4 - #4 by paulinpaloalto). now if we don’t do disc_opt.zero_grad() before generator training (after discriminator training) , won’t gradient from previous discriminator training accumulate while training generator