Disentanglement by Supervision

Hi, just a question to Disentanglement by Supervision:
I understood the lecture that way:
“One way to encourage the model to build a disentangled Z-Space representation is to label your data and use a similar process we used in Conditional GANs.
But as the information, e.g. hair color, is encoded in the Z-space you do not need an additional class vector here, but just additional labels of controllable features in the real images.
That means to me that the real images have additional labels, but the generated images don’t have. That corresponds to the image shown:

My question is: If I add those labels to the real images, the critic can use this information to distinguish these certain features, ok so far.
But if these labels are missing in the generated images what is the input to the critic then?
Only the image without labels? That wouldn’t fit to the input size of the real images.
Generated labels by the generator? So the generators output must be the image pixels and the labels?
As this is missing in the course-image I am not sure if I got it the right way.

Thanks for clarification!

Hey there @Bernhard_Wieczorek ,

To encourage disentanglement, real images should have additional labels for controllable features (e.g., “Husky” for dog breed). The generator must also produce these labels along with the generated images. This means the input to the critic for real data is pairs of (real image, real labels), and for generated data, it is pairs of (generated image, generated labels). By providing the critic with consistent input formats for both real and generated images, the model can learn to disentangle features perfectly in the latent space. So, the generator’s output must have both the image and the corresponding labels to achieve this.

Hope this helps!

1 Like