My guess is the problem W-GAN solves is modal collapse and improves learning stability. If we encode the class then modal collapse is no longer an issue. We might prefer BCE Loss over W-GAN because it is faster.
@Andy_Davidson, I think the reason for using a regular GAN vs WGAN here has more to do with just keeping the GAN basic in order to focus on the new concept that’s being introduced - in this case, conditional generation.
You make a good point that, since we are encoding the class as part of the input to the generator, there shouldn’t be a problem with mode collapse for that class. But, I think the model could still be prone to mode collapse based on some other feature - say fur color, where the generator learns that dogs with gray fur tend to fool the discriminator and so the generator starts focusing on generating only images of dogs with gray fur.
You are in good company with your thought of combining W-GAN concepts for conditional generation. If you google it, you can see that some people have been trying it - mostly fairly recently (in the last 3 or 4 years), I think because W-GANs are so new.
Here are a couple of papers discussing applying W-GAN concepts to conditional generation (cGAN), if you’re interested: