Hi, just a question to Disentanglement by Supervision:
I understood the lecture that way:
“One way to encourage the model to build a disentangled Z-Space representation is to label your data and use a similar process we used in Conditional GANs.
But as the information, e.g. hair color, is encoded in the Z-space you do not need an additional class vector here, but just additional labels of controllable features in the real images.”
That means to me that the real images have additional labels, but the generated images don’t have. That corresponds to the image shown:
My question is: If I add those labels to the real images, the critic can use this information to distinguish these certain features, ok so far.
But if these labels are missing in the generated images what is the input to the critic then?
Only the image without labels? That wouldn’t fit to the input size of the real images.
Generated labels by the generator? So the generators output must be the image pixels and the labels?
As this is missing in the course-image I am not sure if I got it the right way.
Thanks for clarification!
- Link to the classroom item I am referring to:
https://www.coursera.org/learn/build-basic-generative-adversarial-networks-gans/lecture/VPQ7i/disentanglement