Why is the sigmoid function's z term equal to "w*x+b" in logistic regression?

The sigmoid function is 1 / (1 + e^(-z)) where z = w*x+b. Why the z term is set to w*x+b? I understand why we use w*x+b in linear regression since that is the equation for a line, but I do not understand why we are able to use it in the sigmoid. Any explanation is welcome!

The logistic prediction is based on a linear prediction which is passed through a sigmoid function, so that its range is limited to the range of 0 to 1. This matches the binary (0 or 1) values that are the labels.

It turns out this is a good choice when you have to compute the partial derivatives of the cost (i.e. compute the gradients), so you can use gradient descent to find the weights that minimize the cost.

1 Like

If you’re looking for some intuition on why logistic regression works, I recommend watching (or rewatching) the video on decision boundaries:

The video covers how the “line” from the equation can be visualized as a decision boundary for logistic regression. Basically, the line defined by the w*x+b can be used to divide up the input into two output classes.

This visualization doesn’t necessary work for higher dimension/more complicated systems, but I think it helps with the fundamental intuition behind it.

Rewatching with this question in mind did help, thanks for that. I am more interested now in how anyone figured this out in the first place.

Clever people have been working on logistic regression since at least the 1950’s.

Logistic regression is based on the original perceptron concept, which was a simplified simulation of the actions of a biological neuron.

1 Like

That is awesome! I look forward to learning more of this stuff, including the long history

The courses focus on concepts and implementation, but doesn’t much discuss how we got here.

So the Logistic regression uses the Linear Regression function on which the sigmoid function is then applied to get a prediction between 0 and 1 ?

Not only the decision boundary but even Logistic regression itself is then based on a Linear function = wx+b , because eventually Logistic Regression = sigmoid (Linear Regression) , so can we say that Logistic Regression = function (Linear Regression) ? . The reason this sounds a little difficult to comprehend because in Logistic Regression you intend to predict the classification which is typically binary whereas in Linear Regression you intend to predict a value which is non binary.

You are correct about the forms of the activation functions.

Where the difference lies is in the cost functions that use the activations.

  • In linear regression, we’re trying to create a model that fits the data points.
  • In logistic regression, we’re trying to create a boundary that separates the data points into True and False regions.

These two tasks require completely different cost functions.

1 Like