I just finished my first week’s assignment but had a few questions while working on it:
When calculating loss for generator and discriminator, why are we comparing a matrix of zeros with the fake image for generator and matrix of ones with the real image for the discriminator? Perhaps it makes sense at the beginning of the epochs when both the
disc are not smart enough but as we progress through the epochs shouldn’t we compare real and fake images with each other for both of their losses and optimize for that?
Why are we not passing the
disc_loss to the generator and instead calculating
gen_loss from fake image and matrix of zeros?
EDIT: Wherever I have used the terms teacher and student , please replace them with discriminator and generator respectively. Then my answer will make complete and correct sense.
- Answer to your 1st Question
The discriminator’s (e.g. a teacher) job is to classify images either real or fake, and hence, it is trained directly on the real and fake images. Whereas the generator (e.g. a student) is trained with the help of the discriminator, and therefore, the discriminator, which is getting trained in its turn, acts as the loss function for the generator. The generator gets the feedback for the generated images from the discriminator and updates its (generator) weights accordingly. And this theorem makes sense all throughout the learning because everything is clear as the teacher is learning on his own (discriminator loss) and the student is using the teacher to improve his knowledge. No matter how long the training goes on, this theorem will help both the student and teacher to improve on the task. So, you should keep this loss always in the training.
Now, if I talk about the comparison of real and fake images, then this fancy jargon is known as reconstruction loss. I agree with your point that we should directly compare the loss between (say least square loss) real and fake images and based on that we should train the generator. You will study this loss in Course 3 during training Pix2Pix and CycleGAN.
- Answer to your 2nd Question
So, disc_loss is only for training the discriminator because it is trained like every other network in Deep Learning, i.e., directly to predict fake or real. You should not pass this loss to the generator because of the following 2 reasons:
disc_loss is calculated using real images but the generator MUST NOT get to know anything about real images during training, otherwise simply train the student on real images. You will learn about Variational Auto Encoders which are based on this concept (Course 2 Week 2).
The generator wants to fool the discriminator i.e. discriminator should generate the probability ~‘1’ for fake images. Therefore, in the gen_loss, you are comparing the discriminator’s output for fake images with a matrix of ones. But, if you look into the disc_loss, you are comparing fake images and real images with matrices of zeros and ones respectively. You are doing this because the discriminator ONLY wants to learn to classify images as fake or real, and the generator wants to fool the dicriminator. That’s why you have two different losses, one for each G and D.
student may deserve some closer look, since the relationship between
Discriminator is adversarial instead of cooperative. As you have said, the purpose of
Generator is to fool
Discriminator struggles to discriminate the fake samples, which are generated by
Generator, from the real samples. These words are perhaps more suitable for Generative Teaching Network (GTN) instead of Generative Adversarial Network (GAN).
Thanks for correcting the mistake, and I completely agree with your point on teacher and student.
So wherever I have used the terms teacher and student, please replace them with discriminator and generator respectively. Then my answer will make complete and correct sense.
Thank you @28utkarsh for your explanation.
On the second point concerning
disc_loss, I agree that the generator must not see the real images. My question is that in the current implementation there is no feedback loop from the discriminator to the generator. Their loss functions are independent of each other. Doesn’t that go against the overall objective of GANs where the generator improves based on the discriminator’s feedback?
The point is that the loss functions are not independent: the generator loss uses the output of the discriminator, right? Which it then feeds to the criterion function.
Thank you @paulinpaloalto
I see where I got confused. I mixed the discriminator BCE loss function with the discriminator’s output which is a value between 0 (fake image) and 1(real image).
If I am correct, this is the value that is passed to the generator to modulate its parameters to optimize its loss function.
Well, it’s a little more complicated and subtle than “values being passed”, but I think you’re headed in the right direction. It is the gradients of the cost J (which is determined by the output of the discriminator given as input the outputs of the generator, together with BCE loss) that are used to modify the parameters of the generator (the weights and biases at every layer) in order to make it better at fooling the discriminator. Of course this whole interaction is dynamic since they both change (separately) at each “iteration” of the training.
One additional clarification on the original question in case anyone reading this in the future is confused -
disc_loss does not compare fake images with a matrix of zeros, but compares the discriminator’s prediction for that image with 0 (discriminator’s goal is to identify fakes as fake = 0).
disc_loss also compares its prediction for real images against 1, since it wants to identify real images as real = 1).
The reason there’s a matrix of 1’s and 0’s in
disc_loss is that it’s looking at a batch of predicitons.