# Cost Function of Logistic Regression , Binary Classification

I have a question regarding the logistic regression cost function.

Should the x value from the training data set be placed in both logarithm functions (1*logf(x)) and log(1-f(x)) of the cost function, or should it only be placed in one of these logarithm functions based on the actual output y of that x, similar to how we did it in linear regression?

The complete formula for logistic regression is:
`average over samples for (y_target * log(f(x)) + ((1-y_target) * log(1-f(x)))`
where f(x) is y_predicted (the output y of that x from the model).

This formula already â€śchoosesâ€ť which log function to use by making the results of one of those log functions 0. Therefore, you can place the x value in both the log functions.

The y_target values can only be 0 or 1.

If y_target = 0, then `y_target * log(f(x))` would be 0, and so you would be left with `(1-y_target) * log(1-f(x))` or just `log(1-f(x))` for that sample.

If y_target = 1, then `(1-y_target) * log(1-f(x))` would be 0, and so you would be left with `y_target * log(f(x))` or just `log(f(x))` for that sample.

1 Like

You answered me with something that Prof. Andrew told us. and Thatâ€™s already in my knowledge

my question :
(y_target * log(f(x)) + ((1-y_target) * log(1-f(x)))

suppose x=2
then would it be
a) y_target=1* log(f(x=2) + (1-y_target=0)* log(1-f(x=2))) ?
or
b) either a) log(f(x=2) or log(1-f(x=2))) depending on the exact output in between 0 and 1?

Itâ€™s not entirely clear to me what youâ€™re asking.

If x=2 in that sample, then you can always use f(x=2) in both the log functions. The formula will â€śchooseâ€ť which log function is used for the cost. Which log function is chosen depends on y_target, it does not depend on x or f(x).

Although they may look similar, y_target is a completely different value from y_prediction (or f(x)).

The y_target is provided in the data, and must be exactly 0 or exactly 1. For example, say you are using logistic regression to decide if an image contains a dog or does not contain a dog. In your data, you would have images and labels for â€ścontains dogâ€ť or â€śno dogâ€ť. That label is y_target, and each image can either contain (y_target=1) or not contain (y_target=0) a dog. In this case, y_target cannot be 0.5, since it is a yes/no (binary) label.

The y_prediction (or f(x)), on the other hand, is computed/output by the model, and is different from y_target. It is possible for f(x) to be any real number between 0 and 1 (assuming you have a sigmoid activation function). For example, y_prediction, or f(x), can be 0.6. The y_prediction can be thought of as the probability that an image contains a dog.

1 Like

Iâ€™ll do a little guessing here as to what your question is.

Yes, x is used in both places where f(x) is noted.

• f(x) is the sigmoid of (x*w + b), and those values will be between 0 and 1, but never exactly equals either.

• The y_target values are limited to 0 or 1 - there are no in-between values.

Although I donâ€™t know exactly where this equation comes from.

hi Tom,

Hi mentor
I understand this is a math question, yet it will help me to understand why the definition of logistic loss function works. Could you explain for me why -log(1-f) graph looks like this (when 0<f<1, 0< -log(1-f)<infinite)?

Many thanks
Christina

We donâ€™t use z values of exactly 0 and 1, since the log(z) and log(1-z) functions explode there.

Just use values that are extremely close to 0 and 1.

The graph looks like that because thatâ€™s just the way the formula and the log function works!

It sounds like you want to be convinced of the graph for the case that `0 < f < 1` , then `0 < -log(1-f) < infinite`. You can actually try to plot this out by hand (I calculated the log numbers here using Google search).

f = 0.01, then -log(1-f) = 0.0043648054
f = 0.1, then -log(1-f) = 0.04575749056
f = 0.2, then -log(1-f) = 0.096910013
f = 0.3, then -log(1-f) = 0.15490195998
f = 0.4, then -log(1-f) = 0.22184874961
f = 0.5, then -log(1-f) = 0.30102999566
f = 0.6, then -log(1-f) = 0.39794000867
f = 0.7, then -log(1-f) = 0.52287874528
f = 0.8, then -log(1-f) = 0.69897000433
f = 0.9, then -log(1-f) = 1
f = 0.99, then -log(1-f) = 2
f = 0.999999, then -log(1-f) = 5.99999999999
f = 0.99999999999999, then -log(1-f) = 14.0003472608

If you plot out `f` and `-log(1-f)` by putting the dots on a grid and drawing a line through them (or use an Excel/Sheets program ;)), you will see this looks like the one in the lecture slide.

1 Like