I have a question regarding the logistic regression cost function.
Should the x value from the training data set be placed in both logarithm functions (1*logf(x)) and log(1-f(x)) of the cost function, or should it only be placed in one of these logarithm functions based on the actual output y of that x, similar to how we did it in linear regression?
The complete formula for logistic regression is: average over samples for (y_target * log(f(x)) + ((1-y_target) * log(1-f(x)))
where f(x) is y_predicted (the output y of that x from the model).
This formula already “chooses” which log function to use by making the results of one of those log functions 0. Therefore, you can place the x value in both the log functions.
The y_target values can only be 0 or 1.
If y_target = 0, then y_target * log(f(x)) would be 0, and so you would be left with (1-y_target) * log(1-f(x)) or just log(1-f(x)) for that sample.
If y_target = 1, then (1-y_target) * log(1-f(x)) would be 0, and so you would be left with y_target * log(f(x)) or just log(f(x)) for that sample.
my question :
(y_target * log(f(x)) + ((1-y_target) * log(1-f(x)))
suppose x=2
then would it be
a) y_target=1* log(f(x=2) + (1-y_target=0)* log(1-f(x=2))) ?
or
b) either a) log(f(x=2) or log(1-f(x=2))) depending on the exact output in between 0 and 1?
It’s not entirely clear to me what you’re asking.
If x=2 in that sample, then you can always use f(x=2) in both the log functions. The formula will “choose” which log function is used for the cost. Which log function is chosen depends on y_target, it does not depend on x or f(x).
Although they may look similar, y_target is a completely different value from y_prediction (or f(x)).
The y_target is provided in the data, and must be exactly 0 or exactly 1. For example, say you are using logistic regression to decide if an image contains a dog or does not contain a dog. In your data, you would have images and labels for “contains dog” or “no dog”. That label is y_target, and each image can either contain (y_target=1) or not contain (y_target=0) a dog. In this case, y_target cannot be 0.5, since it is a yes/no (binary) label.
The y_prediction (or f(x)), on the other hand, is computed/output by the model, and is different from y_target. It is possible for f(x) to be any real number between 0 and 1 (assuming you have a sigmoid activation function). For example, y_prediction, or f(x), can be 0.6. The y_prediction can be thought of as the probability that an image contains a dog.
Hi mentor
I understand this is a math question, yet it will help me to understand why the definition of logistic loss function works. Could you explain for me why -log(1-f) graph looks like this (when 0<f<1, 0< -log(1-f)<infinite)?
The graph looks like that because that’s just the way the formula and the log function works!
It sounds like you want to be convinced of the graph for the case that 0 < f < 1 , then 0 < -log(1-f) < infinite. You can actually try to plot this out by hand (I calculated the log numbers here using Google search).
f = 0.01, then -log(1-f) = 0.0043648054
f = 0.1, then -log(1-f) = 0.04575749056
f = 0.2, then -log(1-f) = 0.096910013
f = 0.3, then -log(1-f) = 0.15490195998
f = 0.4, then -log(1-f) = 0.22184874961
f = 0.5, then -log(1-f) = 0.30102999566
f = 0.6, then -log(1-f) = 0.39794000867
f = 0.7, then -log(1-f) = 0.52287874528
f = 0.8, then -log(1-f) = 0.69897000433
f = 0.9, then -log(1-f) = 1
f = 0.99, then -log(1-f) = 2
f = 0.999999, then -log(1-f) = 5.99999999999
f = 0.99999999999999, then -log(1-f) = 14.0003472608
If you plot out f and -log(1-f) by putting the dots on a grid and drawing a line through them (or use an Excel/Sheets program ;)), you will see this looks like the one in the lecture slide.