Hello,

I am reading paper “From GAN to WGAN” and want to ask questions about it in this thread.

After posting Q1, I will post Q2, 3, 4 etc also in this thread, as those questions come up.

Q1. this is a table on page 3.

what is “data x”? Where is in in GAN? Is it output of generator G? Also, the table calls “over data x” for p_g, and “over real sample x” for p_r – how is this consistent?

Thank you

I have not read that paper, so I’m just guessing, but I’d say that p_g is the distribution of the output of the generator. That’s my guess as to what they mean by x in that context.

Note that is consistent with how they describe p_r: the distribution of the real samples x.

The inputs to the discriminator are x and they either come from real samples or generated samples.

so x is input to discriminator, whether it is output of the generator or real image…

This is a “pure math” thing. Unless you have gotten to about junior year as a math major and taken a course in Real Analysis (what comes after calculus and multivariate calculus), this is “over the top”. Do you know the difference between an “open” and a “closed” set?

The point is that in an open set, the minimum is the limit of values in the set, but it’s not actually an element of the set. That’s why they use the term infimum: to handle that distinction.

Don’t worry about it.

data x is the real examples used to create discriminator D which estimates the probability of a given sample coming from the real dataset.

I am not sure what you are asking here?

A generator G outputs synthetic samples given a noise variable input z (z brings in potential output diversity). It is trained to capture the real data distribution so that its generative samples can be as real as possible, or in other words, can trick the discriminator to offer a high probability.

So that means no data x is not output of generator G but input z.

p_g ==> being generator’s distribution over data x is capturing real data distribution to generative samples can be as real as possible, or in other words, can trick the discriminator to offer a high probability.

p_r ==>data distribution over real sample x ==> this data distribution is from the discriminator over real sample x

So it is basically p_z creating a real product using fake samples which p_g then generates the data distribution over real dataset to improve the generator probability to create a better real product using fake samples which again the discriminator is challenging the generator by creating a more better real samples when comparing with the supervised real samples created using supervised fake samples.

Basically here discriminator is the quality checker and generator is trying to create more high quality samples to fool discriminator, there by also improving discriminator in catching issues with generators supervised fake-real samples

The main reason of Wasserstein formula to use infimum is because it measures of how far the prediction of the critic for the real is from its prediction on the fake where as BCE loss Loss measure that distance between fake or a real, but to a ground truth of 1 or 0. So basically the discriminator is bounded between 0 and 1 in BCE loss but in W-loss approximates the Earth Mover’s Distance between the real and generated distribution, hence use of greatest lower bound in calculating smallest cost is justified.

There are two videos in the same week explaining your doubt. Kindly go through Wasserstein Loss and Condition on Wasserstein Critic. that should explain the reason of choosing infimum and not minimum.

Regards

DP

To give a concrete example of the point here, consider the following set:

\{x \in \mathbb{R}: 0 < x < 1\}

What is the minimum value in that set? There is none, right? If you pick any element of that set that is “close” to 0, there are an uncountably infinite number of other values in that set that are closer to 0. You want to say 0, but 0 is not an element of that set.

So we need to call 0 something different than the minimum of the set: it is the “greatest lower bound” or “infimum” of that set. They are basically inventing a new word (but derived from Latin, of course) to represent that concept. By analogy, we would say that 1 is the “supremum” of that set (the least upper bound of the set).

You would certainly be forgiven for thinking that this is not an important or useful distinction, but mathematicians have parties about stuff like this.

Thanks, Paul. I understand now.