I just watched a video about BCE loss used in CycleGAN’s. If BCE loss is so great, why we use it in all GAN’s, not just CycleGAN’s?
Why do we need W-loss which is more complex to calculate?
Why do we need BCE loss which is flat is outer regions and causes vanishing gradients during training?
Thanks
Hi @Dennis_Sinitsky,
I’m not sure which video about BCE loss used in CycleGANs you’re referring to - something from outside this course or one of the videos from the course - so I’ll try to answer in general.
BCE loss is good for classifications where there are two results, so it has traditionally been used for GANs when we want to decide between two results, such as real or fake. But, as you mention, BCE loss is prone to mode collapse and vanishing gradient issues, so as work on GANs has matured, new types of loss calculations have been developed. W-loss is one of those newer options that helps guard against mode collapse and vanishing gradients.
For CycleGANs, there are actually a whole set of loss components combined together - adversarial loss, cycle consistency loss, and identity loss. One of the videos in week 3 discusses using Least Squares Loss (aka mean-squared error, or MSE) for the adversarial loss portion of CycleGANs, instead of using BCE - another approach that helps with mode collapse and vanishing gradient issues.