Week 2 lesson : softmax

Link: https://www.coursera.org/learn/advanced-learning-algorithms/lecture/mzLuU/softmax

When lesson is started logistic regression example is given for y=1 and y=0 and then the equations are built for softmax but the problem I see is that softmax equations are built for y=1,2,3…N and not y =0,1,2,3. Isnt there a possibility that output matches none of 1,2,3,4. Even mathematically, how you can say loss is -log a2 or - log a3 because it was derived based on y=0 earlier. Can someone explain?

Softmax doesn’t use y as integers.

Softmax takes ‘n’ floating point inputs (often they are scaled from 0.0 to 1.0), and re-scales them so that their sum equals exactly 1.0.

Hello @Pankaj_Shukla,

You are right that labels should normally be zero-based, and this is the convention in tensorflow and many other packages. The lecture are just discussing a one-based approach so we are just going to have to keep in mind the difference.

Cheers,
Raymond