It was glossed over in the lecture that gradient penalty term is required to enforce 1-L continuity. Moreover any gradient other than 1 is penalised.
Can somebody explain how come adding gradient penalty help achieving 1-L continuity?
It was glossed over in the lecture that gradient penalty term is required to enforce 1-L continuity. Moreover any gradient other than 1 is penalised.
Can somebody explain how come adding gradient penalty help achieving 1-L continuity?
Hi Richeek,
Welcome to the community !
Take a look at this attachment :
As you can see here, the regularization term or the gradient penalty is a positive number that is being added to the loss function, so when you are trying to minimize the loss function, you are also trying to get the regularization term close to zero such that it doesn’t have a big say in the loss and to bring it to that point eventually means that you are indirectly trying to bring the norm closer to 1 , thus indirectly enforcing the constraint. Also, it is better compared to weight clipping due to stability during the training.
Hope you get the point, if not feel free to post your queries.
Regards,
Nithin
What about when gradient is less than 1 (following L-1 continuity)? Why the critic’s loss function want it to be exactly 1?
It doesn’t want it to be exactly 1 but much closer to 1, say it wants it to be in this region 1-alpha to 1+alpha where alpha is a very small number.
1-L continuity asks me to keep the norm to be at most 1 at all points, it can be less than 1 but by keeping it near to 1, I’m still enforcing the constraint right.
Also, we have to consider model training too, we have to think of a way to provide stable training + following the constraints and thus it is implemented this way.
I see thanks for your reply!