It is not the value of the linear function that is directly used to make the decision: it is the output of sigmoid. So the linear function output being between 0 and 0.5 does not predict “False”. The point is that sigmoid(0) = 0.5
and sigmoid is monotonic. So if the linear value is \leq 0, then sigmoid will be \leq 0.5 and we interpret that as predicting “False”. If the linear output is > 0 then sigmoid > 0.5 will predict “True”. We are interpreting the sigmoid output as being the probability of “True”.
The thing some people have suggested is “why don’t we use tanh
instead of sigmoid
and then use tanh(z) > 0
as True
”. That is discussed on this thread.