why do we have to multiply -1 and do mean of crit_fake _pred.

Hi @starboy,

To recap, the critic is the equivalent of a discriminator, the difference is the critic tries to maximize the distance between its evaluation on a fake and its evaluation on a real, so the output is no longer between 0 and 1 but can be any real number.

For the generator, the loss is calculated by maximizing the criticâ€™s prediction on the generatorâ€™s fake images.

The W-loss can be represented by this formula:

and you are now calculating the second half of the formula which is to maximize the criticâ€™s prediction on the generatorâ€™s fake images. Since the critic has the scores for all fake images in the batch, you will use the mean of them.

You can refer to the first minute of the video Condition on Wasserstein for more information.

Hope it helps!

Fangyi Yu

You can think about it this way:

We want to **maximize** the criticâ€™s scores on the generatorâ€™s fake images. Usually, what we do is we **minimize** a loss function. But, if you think about it, **maximizing** a `value`

is equivalent to **minimizing** `-value`

(negative that `value`

).

Taking the mean, as @fangyiyu said, is just to average over all examples in the batch, so as to have a single scaled value for the loss (if we did a sum, it would have different scales for different batch sizes).

I would just add in something to the answers given by @fangyiyu and @pedrorohde. The reason we are multiplying by -1 is because in the deep learning framework like Pytorch the optimization works as minimization of the objective function. As such we place the minus sign in front!

As mentioned by @sohonjit.ghosh, in computer science when we want to convert a maximization problem into a minimization problem (like knapsack problem), we multiply the loss/cost function with a minus sign. Your question is one of such cases.

Hey @pedrorohde @sohonjit.ghosh,

I have a small doubt regarding this. Just after this function, we are told to calculate the loss for the critic, and it is written that

For the critic, the loss is calculated by maximizing the distance between the criticâ€™s predictions on the real images and the predictions on the fake images while also adding a gradient penalty. The gradient penalty is weighed according to lambda.

In this case, as well, we need to maximize the distance, so why arenâ€™t we minimizing the negative of the expression in this case as well? In other words, why are we using

crit_loss = torch.mean(crit_fake_pred - crit_real_pred + c_lambda * gp)

and not

crit_loss = -1* torch.mean(crit_fake_pred - crit_real_pred + c_lambda * gp)

I have submitted by notebook, and it is showing me an out-of score. Have I done anything wrong?

Hi @Elemento

Maybe the assignmentâ€™s phrasing is a little confusing because the term â€śdistanceâ€ť between the two could go either way, meaning it could be either `crit_fake_pred - crit_real_pred`

or `crit_real_pred - crit_fake_pred`

.

If you think about it, this is the **critic**â€™s loss. We want the critic to predict fake images as being fake (zero) and real images as being real (one). That means we want `crit_fake_pred`

to be very small (**minimize** it), and `crit_real_pred`

to go up (**maximize** it).

Putting it in terms of optimization, this leads our minimization loss function to be `crit_fake_pred - crit_real_pred`

, which amounts exactly to minimizing `crit_fake_pred`

and maximizing `crit_real_pred`

.

As for the gradient penalty (`c_lambda * gp`

), it should be **minimized**: we want the gradient norm to be as close to 1 as possible, and our penalty function measures the distance from the norm to 1. Minimizing it makes the gradient norm get closer to 1.

Hope that made sense for you. Cheers

Thanks a lot @pedrorohde, it makes complete sense to me now!