In the video lecture of Week 3 named ’ 1-Lipschitz Continuity Enforcement’, its mentioned that the formula for WGAN-GP Loss is:
min g max c E(c(x)) - E(c(g(z)) + lambda*E(||gradient(c(x_hat))||2 - 1)^2
Now here the first term is Expectation of Critic of Real Images and the second term is Expectation of Critic of Fake Generated Images
Whereas, in the Week-3 assignment for WGAN, the working code for calculating loss is:
crit_loss = torch.mean(crit_fake_pred) - torch.mean(crit_real_pred) + c_lambda*gp
Now, in this the first term is for the fake generated images.
This is a bit confusing or am I missing something? Please help me with this issue.
Thanks a lot in advance
,
Fellow Coursemate
Hi Ajeesh_Ajayan!
Welcome to the community 
From the equation, you can infer that, for the critic, the loss is calculated by maximizing the distance between the critic’s predictions on the real images and the predictions on the fake images while also adding a gradient penalty.
In the programming assignment, we calculate the negative of this distance. Why so? When it comes to implementation, we minimize a loss function with back-prop but our requirement here is to maximize the distance (maximize the critic’s loss). When you talk about absolute values, maximizing a positive value is to minimize a negative value, isn’t it? ( Let’s say we maximize a number 5 to 7 => this is equivalent to minimising -5 to -7 right ? (absolute value is same)).
This is what they are trying to do here → so instead of maximizing the loss function directly they are minimizing the negative of the loss function. Hope you get the point, if not feel free to post your queries.
Regards,
Nithin
Hi Nithin,
Thanks a lot for your explanation. The problem I am facing with this equation is that, if they are actually using the negative of the loss function, shouldn’t the Penalty part also be negative?
Thats positive in both the equations.
Please clarify this.
Thanks again.
Good question.
The loss function is (min g max c E(c(x)) - E(c(g(z))) + penalty. We are adding penalty separately to the loss function (It has to be minimized) and the min-max part is broken down as I said before.
So it seems that it retains the same form ( wherein it is actually getting added to the loss function separately).
loss = original_loss + penalty ; [original loss = -(distance)]
In the first equation, loss = whole loss (min-max representation of the equation) + penalty ; in the implementation, loss = -(distance) + penalty. Hope this helps.
1 Like
Ohh got it!! Thanks a lot for the clarification.
If you use the formula as mentioned in the lectures, you must set lambda to be negative value. But if you use the formula as in the notebook you set lambda to be positive value