In the programming assignment. The generator loss is calculated by finding mean of all the critic’s prediction scores for the fake image and then negating it. Few questions:

How is sum of simply the critic’s score a loss, don’t we need to use loss function?

Why is the sum then negated? Is it because it is a standard practice like the loss functions for logistic regression, etc.?

It’s not a “standard practice” to multiply loss functions by -1. In the cross entropy loss case, the function is based on the logarithm of a number between 0 and 1, which is a negative value, right? So we multiply by -1 in that case to convert it to a positive value.

Wasserstein Loss is not a normal loss function where the values are by definition all positive. The reasoning behind Wasserstein Loss is explained in detail in the lectures. Now that you’ve seen what happens when the rubber meets the road in terms of writing the code, it might be worth going back and listening again to what Prof Zhou says about all this. Note that the critic output is different than the usual discriminator: instead of using a sigmoid activation on the output to give an answer that looks like a probability, we use no activation at all. That means the outputs can be any real number, either positive or negative. As always, we have a duel between the critic and the generator, but for W-Loss it is expressed as the difference between the mean of the critic outputs on real images minus the mean of the outputs of the critic on fake images. The critic’s goal is to maximize that difference and the generator’s goal is to minimize it.

Thanks, the multiplication by -1 in the cross entropy loss makes sense.
I still don’t understand that while generating generators loss why do we negate the loss reported by critics for the fake image because the mean score could be positive or negative (as critic’s loss could either be positive or negative like you mentioned as well). May be time to revise Prof. Zhou’s video one more time.

I watched the lecture again and my understanding is that it’s not so much that we are negating the critic’s score for the fakes, it is that the overall goal is to maximize or minimize the distance between the critic’s evaluation of the real and the fake images. That is expressed as the difference between the two values, so the formula is the critic’s value for the real minus the critic’s value for the fakes. The critic’s goal is to maximize that value and the generator’s goal is to minimize it.

It would definitely be worth watching the video again yourself and see if that explanation makes more sense listening to how Prof Zhou explains it the second time around.