The sigmoid function is 1 / (1 + e^(-z))
where z = w*x+b
. Why the z term is set to w*x+b
? I understand why we use w*x+b
in linear regression since that is the equation for a line, but I do not understand why we are able to use it in the sigmoid. Any explanation is welcome!
The logistic prediction is based on a linear prediction which is passed through a sigmoid function, so that its range is limited to the range of 0 to 1. This matches the binary (0 or 1) values that are the labels.
It turns out this is a good choice when you have to compute the partial derivatives of the cost (i.e. compute the gradients), so you can use gradient descent to find the weights that minimize the cost.
If you’re looking for some intuition on why logistic regression works, I recommend watching (or rewatching) the video on decision boundaries:
The video covers how the “line” from the equation can be visualized as a decision boundary for logistic regression. Basically, the line defined by the w*x+b
can be used to divide up the input into two output classes.
This visualization doesn’t necessary work for higher dimension/more complicated systems, but I think it helps with the fundamental intuition behind it.
Rewatching with this question in mind did help, thanks for that. I am more interested now in how anyone figured this out in the first place.
Clever people have been working on logistic regression since at least the 1950’s.
Logistic regression is based on the original perceptron concept, which was a simplified simulation of the actions of a biological neuron.
That is awesome! I look forward to learning more of this stuff, including the long history
The courses focus on concepts and implementation, but doesn’t much discuss how we got here.