Explain threshold in logistic regression

After finding optimum parameters in logistic regression using gradient descent for which the cost function is low, why is the threshold 0.5, can’t it be any other value.
It would be nice if someone give an explanation on it.

The values for True and False are 1.0 and 0.0.

The threshold is exactly halfway between them.

1 Like

Thank you @TMosh , if possible can get some reading regarding this threshold.

What sort of additional information are you looking for?

It’s not a deep or subtle point. The idea is that the output of sigmoid looks like a probability, right? So we treat it as the probability that the answer is “Yes” for a given sample. So if it is > 0.5, that’s a positive answer, else it is interpreted as the model predicting “No” for that sample.

1 Like

Does 0.5 works all time or need to adjust the value based on accuracy. For instance take ROC curve for tuning threshold. Correct me if i m wrong, Thank you

Yes, 0.5 works all the time. If the accuracy is bad, then either you need more or better training data or Logistic Regression is not going to be good enough for your problem and you need to consider a real Neural Network (stay tuned for that).

The point is that Logistic Regression can only do “linear separations”: the decision boundary looks like a hyperplane that is expressed by:

w^T \cdot X + b = 0

Notice that sigmoid(0) = 0.5. So whether LR will do a good job or not depends on whether your actual data is linearly separable. Sometimes that is the case and sometimes it’s not …

2 Likes

Thank you @paulinpaloalto , so when z = 0 then sigmoid(z) = 0.5, thats the hyperplane that separates positives n negatives ie (1 and 0).

1 Like

Yes, that is the point: the decision boundary is at 0.5 for the output of sigmoid.

1 Like

Awesome, Thanks you!! @paulinpaloalto @TMosh

A little addition. I advise you to read about precision and recall :slight_smile:
As explained, the 0.5 value can be interpreted as a probability of belonging to class 0 or 1. But what if you want to reduce the number of misclassified 0’s or 1’s ? Then you can tune this value so to have less misclassified on one side in exchange for more on the other.
This is important for instance in medicine, if you don’t want to miss a tumor. You would prefer to have more false alarms than missing one, so you may want to tune the threshold to 0.4 for instance if 1 is the tumor label.
Same thing for a model who would say hello at the door when there is someome. You don’t want it to activate every time there is a cat, so you may want to tune your threshold too

1 Like