Problem with the images generator generated during training process

The generator tends to generate number “1” with the training goes on.
When the step is 10000: (now the numbers are still diverse)

And when the step is 34000: (only 1 and 9)

When the step goes to 49000: (only 1)

But according to the material provided by the notebook, the diversity of numbers generated is still keeping after even 90000 steps…
Why would this happened ?

Hey @apricot, welcome to the community! This is known as Mode Collapse, which you will shortly study in the upcoming weeks. The GAN gets stuck in local minima and is unable to come out of it.

In simple words, the discriminator in the early stage is not able to differentiate for a certain class say “1” in your case, and hence, the generator keeps on producing more and more 1s in order to fool the discriminator, but eventually, the discriminator catches up and tells the generator that all its produced 1s are fake, and now the generator doesn’t know where to go, and hence, the training stops for the generator.

As to why this happened, it might be due to the random noise vectors that your GAN got stuck, and in the upcoming weeks, you will learn ways to avoid this! However, I am not sure, since when I ran this notebook, it didn’t happen in my case!

Thank you for the explanation of the concept “Mode Collapse” ! :wink:
But the weird thing is the code “torch.manual_seed(0)” should have make everyone’s result be same, so I guess there might be some mistakes in my code and the auto-grading system didn’t detect it :face_with_monocle:

Could be true, since the parameters that we were given to specify didn’t lead to mode collapse, as far as I can recall.

@apricot, you’re right that this seems like it’s somehow specific to your code. If you’re up for it, let’s dig in a bit to see if we can find where the problem is. This could help future students, and/or help us improve the tests to catch this in the future.

First to confirm: your assignment passed the auto-grading with full points, and also all the test blocks in the assignment succeeded (printed “Success!” when you ran the blocks)?

Unfortunately, it’s hard to guess where the problem is, since something relatively subtle might tip the scales and start leading to mode collapse. Maybe in either the discriminator or the discriminator loss calculation, since, as @Elemento explained, mode collapse can start if the discriminator starts to detect problems with some digits more easily than others, leading the generator to choose to generate the images that the discriminator has the most problems with.

Can you check that your code is consistent with the “Optional Hints” given in the assignment, esp. for the discriminator and discriminator loss functions? Is there any code that you’re suspicious of - maybe something that you weren’t quite sure of when you implemented it?

Hi Wendy, I’ve double checked that my assignment got full points and all test blocks print “success” when I ran them, plus, I’ve followed all hints.
For the discriminator block, I use a Linear layer followed by LeakyReLU with negative slope of 0.2, and for the final layer of discriminator, I also use a Linear layer to reduce the dimension (hidden_dim to 1). I think those should be correct : )
And for the loss calculation, here is the code:
Are there any mistakes…? Or maybe can I send my full work to you privately?

Hey @apricot,
I don’t know if you have changed the code yourself or if the assignments have been updated, but my code differs a bit from what you have posted. My code goes like:

noise_vec = get_noise(num_images, z_dim, device)
fake = gen(noise_vec).detach()

exa_vec = noise_vec[ : , 0].reshape(num_images, 1)

# Step 2
fake_pred = disc(fake)
fake_true_label = torch.zeros_like(exa_vec)
fake_loss = criterion(fake_pred, fake_true_label)

# Step 3
real_pred = disc(real)
real_true_label = torch.ones_like(exa_vec)
real_loss = criterion(real_pred, real_true_label)

# Step 4
disc_loss = (real_loss + fake_loss) / 2

Our code blocks seem to be exactly the same, except for the line of code:
exa_vec = noise_vec[ : , 0].reshape(num_images, 1)

Try adding this line, and see if you can get the correct results. This line of code does nothing but simply reshape the input for the next few lines of code. If this doesn’t help, then I guess sharing your notebook would be better!

@apricot, this all looks good as far as I can see! I’m happy to take a look at your code if you want to DM it to me.

Hi Elemento, Thanks for sharing you code! I’ve tried your way but it doesn’t work in my case so I’ve shared my code to Wendy.
Plus, I think the assignment have been updated as I didn’t change the code.
I’ll post the result when I got response from Wendy : )

Hi Elemento, Wendy have helped me find out where the problem is:

In training process, I carelessly use different number of images for training discriminator and generator, both number should be “cur_batch_size” and previously I use num_images.
Now it works : )

1 Like