crit_real_pred and crit_fake_pred are the scores of the batch of images. When calculating the crit_loss, the mean should be taken on just these two arguments. The gradient penalty, calculated by gp*c_lambad, is then added to the mean.
If I type "crit_fake_pred - crit_real_pred " instead of “crit_real_pred - crit_fake_pred” the unit test gives it as valid. Another discussion says that the correct order is “crit_real_pred - crit_fake_pred”. I’m confused, could you explain the meaning of this, please?
My understanding of the W-Loss is that the critic is given real and flake images to predict, forming two distributions; one for predicting real images, and the other is for predicting flake images. The further these two distributions is apart, the better the critic’s ability in getting the prediction correct for each of these two categories. So I am not sure what you meant by the order of the real_pred and fake_pred.
@MarcosMM, to follow up a little more on the code implementation part of your question:
As @Kic explains, the main concept is that the critic wants to maximize the difference between its predictions for real and fake. In other words, when you’re calculating the critic’s loss, the farther apart the real and fake predictions are, the smaller the loss should be.
So, if we know the critic wants to maximize crit_real_pred - crit_fake_pred, what does that mean our implementation for the critic’s loss should look like?
Another thing to keep in mind is that the larger the critic’s prediction, the more confident that means the critic is that an image is real. I think these optional hints in the assignment do a good job of using that concept to give some intuition about what we should be subtracting and what we should be adding:
The higher the mean fake score, the higher the critic’s loss is.
What does this suggest about the mean real score?
Another thing I like to do when I’m trying to sanity-check an approach is to try out a couple of simple examples to make sure they make sense. For example, if real_pred = 5 and fake_pred = 1, vs real_pred = 20 and fake_pred = 1, does my approach for calculating critic’s loss give a smaller loss for the real_pred = 20 case, as it should?