Cost Function of Logistic Regression , Binary Classification

farhana_hossain · November 6, 2023, 9:02am

I have a question regarding the logistic regression cost function.

Should the x value from the training data set be placed in both logarithm functions (1*logf(x)) and log(1-f(x)) of the cost function, or should it only be placed in one of these logarithm functions based on the actual output y of that x, similar to how we did it in linear regression?

hackyon · November 6, 2023, 10:15am

The complete formula for logistic regression is:
average over samples for (y_target * log(f(x)) + ((1-y_target) * log(1-f(x)))
where f(x) is y_predicted (the output y of that x from the model).

This formula already “chooses” which log function to use by making the results of one of those log functions 0. Therefore, you can place the x value in both the log functions.

The y_target values can only be 0 or 1.

If y_target = 0, then y_target * log(f(x)) would be 0, and so you would be left with (1-y_target) * log(1-f(x)) or just log(1-f(x)) for that sample.

If y_target = 1, then (1-y_target) * log(1-f(x)) would be 0, and so you would be left with y_target * log(f(x)) or just log(f(x)) for that sample.

farhana_hossain · November 6, 2023, 10:27am

You answered me with something that Prof. Andrew told us. and That’s already in my knowledge

farhana_hossain · November 6, 2023, 10:29am

my question :
(y_target * log(f(x)) + ((1-y_target) * log(1-f(x)))

suppose x=2
then would it be
a) y_target=1* log(f(x=2) + (1-y_target=0)* log(1-f(x=2))) ?
or
b) either a) log(f(x=2) or log(1-f(x=2))) depending on the exact output in between 0 and 1?

hackyon · November 6, 2023, 11:15am

It’s not entirely clear to me what you’re asking.

If x=2 in that sample, then you can always use f(x=2) in both the log functions. The formula will “choose” which log function is used for the cost. Which log function is chosen depends on y_target, it does not depend on x or f(x).

Although they may look similar, y_target is a completely different value from y_prediction (or f(x)).

The y_target is provided in the data, and must be exactly 0 or exactly 1. For example, say you are using logistic regression to decide if an image contains a dog or does not contain a dog. In your data, you would have images and labels for “contains dog” or “no dog”. That label is y_target, and each image can either contain (y_target=1) or not contain (y_target=0) a dog. In this case, y_target cannot be 0.5, since it is a yes/no (binary) label.

The y_prediction (or f(x)), on the other hand, is computed/output by the model, and is different from y_target. It is possible for f(x) to be any real number between 0 and 1 (assuming you have a sigmoid activation function). For example, y_prediction, or f(x), can be 0.6. The y_prediction can be thought of as the probability that an image contains a dog.

TMosh · November 6, 2023, 5:14pm

I’ll do a little guessing here as to what your question is.

Yes, x is used in both places where f(x) is noted.

f(x) is the sigmoid of (x*w + b), and those values will be between 0 and 1, but never exactly equals either.
The y_target values are limited to 0 or 1 - there are no in-between values.

TMosh · November 6, 2023, 5:15pm

Although I don’t know exactly where this equation comes from.

farhana_hossain · November 7, 2023, 6:16am

hi Tom,
I will return to this discussion after some time…

Christina_Fan · January 4, 2024, 12:51am

Hi mentor
I understand this is a math question, yet it will help me to understand why the definition of logistic loss function works. Could you explain for me why -log(1-f) graph looks like this (when 0<f<1, 0< -log(1-f)<infinite)?

Many thanks
Christina

TMosh · January 4, 2024, 1:11am

We don’t use z values of exactly 0 and 1, since the log(z) and log(1-z) functions explode there.

Just use values that are extremely close to 0 and 1.

hackyon · January 4, 2024, 1:14am

The graph looks like that because that’s just the way the formula and the log function works!

It sounds like you want to be convinced of the graph for the case that 0 < f < 1 , then 0 < -log(1-f) < infinite. You can actually try to plot this out by hand (I calculated the log numbers here using Google search).

f = 0.01, then -log(1-f) = 0.0043648054
f = 0.1, then -log(1-f) = 0.04575749056
f = 0.2, then -log(1-f) = 0.096910013
f = 0.3, then -log(1-f) = 0.15490195998
f = 0.4, then -log(1-f) = 0.22184874961
f = 0.5, then -log(1-f) = 0.30102999566
f = 0.6, then -log(1-f) = 0.39794000867
f = 0.7, then -log(1-f) = 0.52287874528
f = 0.8, then -log(1-f) = 0.69897000433
f = 0.9, then -log(1-f) = 1
f = 0.99, then -log(1-f) = 2
f = 0.999999, then -log(1-f) = 5.99999999999
f = 0.99999999999999, then -log(1-f) = 14.0003472608

If you plot out f and -log(1-f) by putting the dots on a grid and drawing a line through them (or use an Excel/Sheets program ;)), you will see this looks like the one in the lecture slide.

Topic		Replies	Views
Logistic Regression Cost Supervised ML: Regression and Classification week-3	5	34	November 2, 2024
Does Logistic Regression cost function use log10 or ln? Neural Networks and Deep Learning	3	687	May 9, 2021
C1_W3_Logistic_Regression_Potential problem Supervised ML: Regression and Classification week-3	4	656	July 18, 2022
Compute_cost_reg_test function in public_tests.py (C1_W3_practice lab) Supervised ML: Regression and Classification week-3	2	552	July 31, 2022
Cost function for logisitic regression : Has Andrew made a mistake here? Supervised ML: Regression and Classification week-3	8	81	February 12, 2025

Cost Function of Logistic Regression , Binary Classification

Related topics