I understand by 1-L continuity we don’t want the gradients to go skyrocket and be in limit range for stable learning. But why the gradients are taken w.r.t mixed images?? Why not w.r.t model trainable weights?? And why only to critic? Can someone explain me this Please?
It would be worth just listening again to what Prof Zhou says about this in the lectures. The mixed images part of it is just for the gradient penalty term, which she describes as a “regularization” term. That is a more tractable way to enforce the 1-L continuity and my interpretation is that the mixing of the real and fake images is just a simple way to include both the generator and the critic in the penalty term. It doesn’t have to be sophisticated. It’s not that we’re applying any metric of “goodness” (meaning the quality of the images) in the gradient penalty term: it’s just to keep the gradients in the 1-L bounds. The image quality is driven by the base part of the cost function.
But everything here is based on the output of the generator and the critic, so the model weights are totally involved. And the only cost we have is based on the output of the critic, right? Either the critic applied to a fake image or the critic applied to a real image. That’s all we have to drive the learning based on the gradients. Then the gradients propagate backward to the weights of both the critic and the generator, but for the generator the gradients go through the critic by definition.