Optimization Order

In the 1st assignment “Your First GAN”, we have chosen to run forward and backward propagation on the discriminator first and then the generator. Is there any specific reason to choose that order? Can we reverse the order?


1 Like

Interesting question. It’s been quite a while since I watched the lectures for this course, so I forget whether Prof Zhou discusses this point. Just on general principles, the discriminator has inputs some of which are real images, so maybe it has a better chance of being able to learn something meaningful with no previous training of the generator. Of course training the generator depends on the feedback of the discriminator, so maybe it’s better to start in the given order meaning you’re more likely to start making meaningful progress even on the first iteration if the discriminator’s feedback to the generator is not just purely random.

But that’s just my intuition. We do all this in a loop that gets repeated many times, so maybe it doesn’t matter which you do first. Try it the other way and see what happens. Science! :nerd_face:

1 Like

Thanks @paulinpaloalto, your response aligns with my research. Here is what I found: by updating the discriminator first, we ensure that it is accurately distinguishing between real and fake images. This makes the task for the generator more challenging, which is essential for its learning process. If we update the generator first, we use an untrained discriminator to compute the loss. Since the discriminator is not yet updated, the feedback to the generator might not be reliable.

Yes, that makes sense, but it’s only a question of what happens on the first iteration, right? Once we’re past that then we’re off to the races. Typically we’re doing at a minimum hundreds and a lot of times more like thousands or tens of thousands of training iterations, so maybe this is all in the noise in reality.

But still, might was well start off on the “right foot” if we can. Or to put it another way, why waste the first iteration if you don’t have to?

I believe it should only matter in the first iteration, but not 100% sure. It might have chain effect in the subsequent iterations and might delay the convergence.