Suggestion for introduction of the loss function in "Logistic Regression"

When the “loss function” is introduced for the “Logistic Regression”, Professor Ng starts by giving the formula for the algebraized (one-linerized?) “cross entropy loss” for a 2-class case, then justifies why this looks like a good choice.

I would suggest doing it the reverse way:

  1. Introduce the idea of “cross-entropy loss” for a 2-class case. In particular name the concept of cross entropy loss in case students would like to look it up to learn more about it. It is not yet a single formula but distinguishes cases y=0 and y=1 explicitly.

  2. It this form it is easy to see that it takes on values in a way that we would like to have in an adequate loss function.

  3. Set up a single algebraic loss function formula using by the usual trick of summing over terms: y * (value in case y = 1) + (1-y) * (value in case y = 0)

  4. And that’s it.

Thanks for your suggestion.

I just found out that the one-liner formula can also be seen as being derived from a log-likelihood estimation approach

The “generative model” being in this case, the neural network. With the weights the model parameters that we want to minimize over.

This is explained later in the course though much too quickly :grimacing: I will need to delve into this deeper.