Hi, I have some thoughts and a question regarding the Deep Convolutional GAN (DCGAN) training procedure provided in the assignment.
Here is the code for reference:
## Update discriminator ##
disc_opt.zero_grad()
fake_noise = get_noise(cur_batch_size, z_dim, device=device)
fake = gen(fake_noise)
disc_fake_pred = disc(fake.detach())
disc_fake_loss = criterion(disc_fake_pred, torch.zeros_like(disc_fake_pred))
disc_real_pred = disc(real)
disc_real_loss = criterion(disc_real_pred, torch.ones_like(disc_real_pred))
disc_loss = (disc_fake_loss + disc_real_loss) / 2
# Keep track of the average discriminator loss
mean_discriminator_loss += disc_loss.item() / display_step
# Update gradients
disc_loss.backward(retain_graph=True)
# Update optimizer
disc_opt.step()
## Update generator ##
gen_opt.zero_grad()
fake_noise_2 = get_noise(cur_batch_size, z_dim, device=device)
fake_2 = gen(fake_noise_2)
disc_fake_pred = disc(fake_2)
gen_loss = criterion(disc_fake_pred, torch.ones_like(disc_fake_pred))
gen_loss.backward()
gen_opt.step()
The first thing I’ve noticed is that when we generate fake images for the first time:
fake = gen(fake_noise)
there is an overhead of constructing a computation graph for backpropagation. It is okay if we use this result later in the generator update, but instead, a new fake example fake_2
is generated. So, in this case, it is better to wrap it with a no_grad
context manager. This way, there is no need to use the .detach()
method because there is no computation graph:
with torch.no_grad():
fake = gen(fake_noise)
disc_fake_pred = disc(fake)
Alternatively, we could use these examples to update the generator without generating new fake images, while keeping the discriminator update unchanged:
## Update generator ##
gen_opt.zero_grad()
disc_fake_pred = disc(fake)
gen_loss = criterion(disc_fake_pred, torch.ones_like(disc_fake_pred))
gen_loss.backward()
gen_opt.step()
This is more efficient and leads to similar results. The use of fake_noise
and fake_noise_2
implies separate noise vectors for discriminator and generator updates. While this works, it might introduce unnecessary variability.
My question is: what is the reason for generating new fake examples?
Another thought is about the retain_graph=True
option in the discriminator gradients computation. There is no need for this option because the computational graph is recomputed later in the forward pass here:
disc_fake_pred = disc(fake_2)
In some cases, such as in multitask learning where we have two losses computed from the outputs of different layers, it is necessary to retain the graph after backpropagation over the first loss, but this is not the case here.