W-loss how is it any different from BCE loss if we constrain it with 1-L continuous?

  • I wonder how is this loss in fact any different between BCE loss if we are restricting the norm of the loss to be between -1 and 1. Is the fact that it can now output values between -1 and 1 unlike BCE loss which can only output between 0 and 1?

@Usha_Kalyani_Alluri, with W-loss, we’re not restricting the loss to be between -1 and -1. The loss can still grow, but with 1-L continuity, we’re trying to keep the rate of growth (slope) to < 1 at any point along the curve. A subtle difference, but this is enough too help W-loss avoid the vanishing gradient issue we see with BCE loss.