Hello,
In the “Training a Softmax Classifier” video, Andrew Ng mentioned the natural generalization of the logistic regression by choosing the cost function associated to a softmax guess like so:
I get that the important thing, at first glance, is that you “punish” your model for not understanding that it was a cat (the cost being equal to -log(0.2)
here), regardless of what it was thinking of the other classes.
But is that really the only important thing? What I mean by that is: in this picture, the guess is [0.3 0.2 0.1 0.4]
instead of [0 1 0 0]
, but it could have been much “worse” if for example the network was convinced it was looking at a dog, outputing, say [0 0.2 0.8 0]
.
I feel like we could benefit from telling our network that “not knowing” is better than “being convinced of a mistake”.
To this end, maybe we could modify the cost function with, I don’t know, some L2 (or higher) norm of the other guesses (for example here, adding a supplementary cost of 0.3^2 + 0.1^2 + 0.4^2
)?
I’d be glad to hear your thoughts on this.
Thanks for reading