Link: https://www.coursera.org/learn/advanced-learning-algorithms/lecture/mzLuU/softmax
When lesson is started logistic regression example is given for y=1 and y=0 and then the equations are built for softmax but the problem I see is that softmax equations are built for y=1,2,3…N and not y =0,1,2,3. Isnt there a possibility that output matches none of 1,2,3,4. Even mathematically, how you can say loss is -log a2 or - log a3 because it was derived based on y=0 earlier. Can someone explain?