Logistic Regression Cost Function Intuition start around 3:24

Hello and thank you for your help.

I am trying to understand the intuition for the Logistic Error Function, Here is my idea based on the class.

If Y is 1 then Y hat should be Close to One so that the Loss(Y hat, Y ) is 0
If Y is 0 then Y hat should be close to Zero so that the Loss (Y hat, Y) is 0

Could you explain the intuition differently. I have tried reading some of the other forum discussion and thinking about it, but nothing has helped

That sounds correct, but the point is that is just the first step. Then the question is, “ok, if that is your goal, then what is a function that can express that”. How does the log loss function help with that? Prof Ng goes on to explain that in the lecture.

Here’s a thread that shows the graph of the log function between 0 and 1 and discusses this a bit more.

@cstockman as to intuition, for myself I find it clearer to understand this loss function by its alternate name, Cross-Entropy Loss, which in a system can be seen as a measure of ‘information’.

From Claude E. Shannon and Warren Weaver’s classic ‘The Mathematical Theory of Communication’ pp 12:

"Now let us return to the idea of information. When we have an information source which is producing a message by which successively selecting discrete symbols (letters, words, musical notes, spots of a certain size, etc.), the probability of choice of the various symbols at one stage of the process being dependent on the previous choices (i.e., a Markoff process), what about the information associated with this procedure?

The quantity which uniquely meets the natural requirements that one sets up for ‘information’ turns out to be exactly that which is known in thermodynamics as entropy. It is expressed in terms of the various probabilities of forming messages, and the probabilities that, when in those stages, certain symbols be chosen next. The formula, moreover, involves the logarithm of probabilities, so that it is a natural generalization of the logarithmic measure spoken of above in connection with simple cases."

Cross-entropy takes this a bit further in that it is comparing the probability distribution between two outcomes (as a binary classifier, here Y = 0 or Y = 1). Low loss (and thus low entropy) suggests a strong ‘signal’ that directs us to the right choice given the inputs provided, which is why we are trying, through repeated forward/back prop to try and get the network to minimize this loss and find the right combination of weights that gives us the strongest ‘signal’ one way or another.

(P.s. Shannon defines entropy, but not cross-entropy. Still a good read).

Thank you both…

I will be looking these over again to gain more intuition.