Hello,

In the C2W2 assingment, it is showing a technique to identify bias. When the encoder outputs the latent space with the means and stds for each feature, it is calculated the covariance matrix. Then the covariance matrix is used to identify the risk of bias.

I would like to know better how is the correct interpretation after generating the images with the decoder. Should the images show the same pattern for covariance as we have in the original covariance matrix ? But if we want our generations to “escape” those patterns for the sake of diversity?

In essence, my question is how to interpret the risk of bias having these informations: the latent space, the matrix of covariance, the means and stds for the features, and the generated image set.

Thanks in advance!

Hi @Jaspier , here are some thoughts about your questions.

- The latent space is a high-dimensional space representing the data’s underlying distribution. It is possible that the latent space may be biased, for instance, if there are more points in the latent space that correspond to images of one protected class than another. This could lead to the generator generating more images of that protected class, even if the discriminator is not biased.
- The matrix of covariance is a measure of the correlation between different features in the data. It is possible that the matrix of covariance may be biased, for instance, if there is a stronger correlation between two features in one protected class than in another. Again, this could lead to the generator generating images with more correlated features, even if the discriminator is not biased.
- The means and standards for the features are the average and standard deviation of each feature in the data. It is possible that the means and stds may be biased, for instance, if the average age of images in one protected class is different from the average age of images in another protected class. This could lead to the generator generating images that are older or younger, even if the discriminator is not biased.
- The generated image set is the set of images that the generator has generated. It is possible that the generated image set may be biased, for instance, if there are more images of one protected class than another. This could be due to the latent space, the matrix of covariance, the means and standards, or the discriminator.

So, there are a number of ways that bias can be introduced into GANs.