When the “loss function” is introduced for the “Logistic Regression”, Professor Ng starts by giving the formula for the algebraized (one-linerized?) “cross entropy loss” for a 2-class case, then justifies why this looks like a good choice.
I would suggest doing it the reverse way:
-
Introduce the idea of “cross-entropy loss” for a 2-class case. In particular name the concept of cross entropy loss in case students would like to look it up to learn more about it. It is not yet a single formula but distinguishes cases y=0 and y=1 explicitly.
-
It this form it is easy to see that it takes on values in a way that we would like to have in an adequate loss function.
-
Set up a single algebraic loss function formula using by the usual trick of summing over terms: y * (value in case y = 1) + (1-y) * (value in case y = 0)
-
And that’s it.