Week 4 lab Generator not training

I have done the Week 4 lab through the training cell (UNQ_C4 ) without errors, and can run the cell without errors. However, the generator does not appear to be learning. When I run it all the way through the epoch loop, the generator continues generating grey colored images. Has anyone encountered this?

And the loss history is quite bad - the generator is clearly losing!
image

I think I followed all the instructions until that cell. Any insights on what could be causing this? On the W-GAN lab I noticed that unlike the regular GAN, the initial noisy (like TV fuzz) images from the generator were instead flat grey, like in this lab. But as the training went on the generator trained and got better. I guessed that the W-GAN helped control the gradients better. But in this lab (week 4) the generator continues generating flat grey images.

Thanks,
Steve

Hi Steve,

My thoughts here would be:

a) to check that discriminator and generator are not training at the same time.

b) you have correctly choosen and implemented the loss(es)

c) learning rate of optimizer choosen

Some other naive mistake might have been commited so recheck the lab from scratch.

Thats what I can think of.

Update: I found that @Elemento actually answered my question, because @Xiaojian_Deng made the same mistake I did. @Elemento’s answer is here. The gist of his answer is:

Now, though we want to call detach on the fake images when we are updating the discriminator (since we don’t want to update the generator in this case), we don’t want the same thing to happen when we are updating the generator. And hence, when you changed the position of detach method, the generator didn’t update at all, and hence, led to empty squares.


Thank you, @gent.spah, for the checklist! I found the problem, and I believe it is related to your checklist #1. The problem was that I had placed the “.detach()” in the wrong place:

This shows my ignorance of how to use the .detach() method. The detach() docs say

Returns a new Tensor, detached from the current graph.
The result will never require gradient.

but the docs don’t show examples on where it should be placed.

For where to place the .detachI(), in Week 4’s lab, I was following the pattern of the Week 1 lab:

Why is Week 4 lab different, and why does placing the .detach() in the wrong location in the Week 4 lab cause the training to fail?

BTW, this conversation Why should we detach the discriminators input ?! is very relevant, but after reading it through I don’t think it answers my question of where to put the .detach() method.

Thanks,
Steve

2 Likes

Im happy you got it solved but please dont post code solutions here.

Got it, thanks @gent.spah . I wouldn’t have posted my code, but then I found many other posters did, in order to express their concepts (including those that I linked to above). Well, I suppose just because others do it doesn’t mean that I should.

1 Like

The training loop in Week 1 assignment is a bit different. There are 2 separate function for computing the loss for generator and discriminator. So the detached tensor is used just inside the function get_disc_loss, but the gen variable is unchanged and is later used for computing the generator loss with the graph un-detached.

In Week 4 assignment there are no functions for computing the 2 losses, and the variables are re-used. The fake tensor is used as input to the discriminator two times. One time for computing the discriminator loss and the second time for getting the generator loss. So, only in the first case you want the use the detached tensor.

I hope I explained clearly. I am still not 100% sure where to call detach(), but this is the explanation I found in this case.