Density estimation - disadvantage of GANs

Hello there!

In the first video of Week 2, it is mentioned that it is difficult to do density estimation with GANs, and I feel that the explanation was too rough and too quick. Could someone explain it in more detail as to have at leas an intuition about it? Seems like an important concept to know about. Thank you!

Hi Andreea!
I hope that you are doing well.
Density estimation refers to the process of estimating the probability distribution of the data that a model is trying to generate. In layman’s terms, it marks the relationship between observations and their probability. So, for example, if we have certain particular features of a dog, it’s like estimating how often(gets mapped to a probability) a particular feature gets involved and leads to the generation of a dog image.

Likewise, if we want to get the probability density over our modelled features, it might be very difficult to do so in GANs. Why?

Let’s use this picture from the previous lectures for reference.
As can be seen in the figure, the objective of GANs is to generate new and realistic samples similar to a given data distribution rather than understanding and replicating the original probability distribution of the data. This is because of the adversarial nature of the GAN where the only motivation for the generator is to fool the discriminator (in layman’s terms). Thus this process does not ensure that the generated samples will accurately reflect the true distribution of the data which in turn tells us that density estimation would be difficult here -->we can’t estimate how often a particular feature makes up an image.
Hope it helps. If not, feel free to ask for more.

Regards,
Nithin

1 Like

Hi! Thanks for the reply. Let me see if I understood.

Density estimation is hard with GANs because it is used a proxy method for making the generated distribution similar to the real one. Meaning that it’s using the feedback of the Discriminator which tells the Generator if a generated image is real or fake. These are individual points of the real distribution, let’s say “outputs” of the real distribution that don’t tell the whole story. That’s why we need a large sample of generated images to approximate the density estimation. In contrast, with VAEs we have more information of the generated distribution because of the encoder which generates the parameters of the latent space from which the noise vector it’s sampled and then used by the decoder to generate an image.

Having more control of how the noise vector it’s sampled we also have more control on the features that make up an image? Maybe not at the level of accessing exactly the “smile” feature to turn it on or off, but at least we know how often a particular feature leads to the generation of an image with a particular object.

Am I on the right track? :smiley:

Thanks,
Andreea

Hi, Andreea! Good to hear from you, again.
You are on the right track with respect to your understanding of GANs.
To be more precise with regard to VAEs, we have more information on the generated distribution because we are feeding real images (in contrast to random noise in GANs) to encoder blocks of VAEs, to find a good way of representing that image in the latent space, from where we take some kind of representation close to the original image and reconstruct a realistic image from it using encoder-decoder architecture. Hence, it is easier to do density estimation as VAEs are trying to replicate the original distribution, unlike GANs.

Glad that you have got the idea! Have a nice day.
Regards,
Nithin

1 Like