![]()
why do we have to multiply -1 and do mean of crit_fake _pred.
![]()
why do we have to multiply -1 and do mean of crit_fake _pred.
Hi @starboy,
To recap, the critic is the equivalent of a discriminator, the difference is the critic tries to maximize the distance between its evaluation on a fake and its evaluation on a real, so the output is no longer between 0 and 1 but can be any real number.
For the generator, the loss is calculated by maximizing the critic’s prediction on the generator’s fake images.
The W-loss can be represented by this formula:
You can refer to the first minute of the video Condition on Wasserstein for more information.
Hope it helps!
Fangyi Yu
You can think about it this way:
We want to maximize the critic’s scores on the generator’s fake images. Usually, what we do is we minimize a loss function. But, if you think about it, maximizing a value is equivalent to minimizing -value (negative that value).
Taking the mean, as @fangyiyu said, is just to average over all examples in the batch, so as to have a single scaled value for the loss (if we did a sum, it would have different scales for different batch sizes).
I would just add in something to the answers given by @fangyiyu and @pedrorohde. The reason we are multiplying by -1 is because in the deep learning framework like Pytorch the optimization works as minimization of the objective function. As such we place the minus sign in front!
As mentioned by @sohonjit.ghosh, in computer science when we want to convert a maximization problem into a minimization problem (like knapsack problem), we multiply the loss/cost function with a minus sign. Your question is one of such cases.
Hey @pedrorohde @sohonjit.ghosh,
I have a small doubt regarding this. Just after this function, we are told to calculate the loss for the critic, and it is written that
For the critic, the loss is calculated by maximizing the distance between the critic’s predictions on the real images and the predictions on the fake images while also adding a gradient penalty. The gradient penalty is weighed according to lambda.
In this case, as well, we need to maximize the distance, so why aren’t we minimizing the negative of the expression in this case as well? In other words, why are we using
crit_loss = torch.mean(crit_fake_pred - crit_real_pred + c_lambda * gp)
and not
crit_loss = -1* torch.mean(crit_fake_pred - crit_real_pred + c_lambda * gp)
I have submitted by notebook, and it is showing me an out-of score. Have I done anything wrong?
Hi @Elemento
Maybe the assignment’s phrasing is a little confusing because the term “distance” between the two could go either way, meaning it could be either crit_fake_pred - crit_real_pred or crit_real_pred - crit_fake_pred.
If you think about it, this is the critic’s loss. We want the critic to predict fake images as being fake (zero) and real images as being real (one). That means we want crit_fake_pred to be very small (minimize it), and crit_real_pred to go up (maximize it).
Putting it in terms of optimization, this leads our minimization loss function to be crit_fake_pred - crit_real_pred, which amounts exactly to minimizing crit_fake_pred and maximizing crit_real_pred.
As for the gradient penalty (c_lambda * gp), it should be minimized: we want the gradient norm to be as close to 1 as possible, and our penalty function measures the distance from the norm to 1. Minimizing it makes the gradient norm get closer to 1.
Hope that made sense for you. Cheers
Thanks a lot @pedrorohde, it makes complete sense to me now!