why we have to take out mean in the final answer.When in the equation there is no mean given.

What is the use of doing the mean?

Hi @starboy,

The mean is calculated because the overall penalty is calculated on a batch of data points/images, and therefore when you take the mean then you are making sure that the penalty is scaled down to the per-image basis.

The reason behind taking the mean is that the criticâ€™s loss value (difference of fake and real distribution means) will be the mean loss value for the overall batch, and hence the penalty must also be the overall mean to maintain the consistency in criticâ€™s loss.

Let us say that you donâ€™t take the mean of the penalty, then the penalty value will get bigger and bigger as the batch size is increased, and hence, the training will destabilize because the loss value will also change drastically.

