Looking into the logistic regression function, it makes sense intuitively why we use it  it has some neat properties for classification such as the output is always between 0 and 1, however what I don’t get is the motivation for the wx + b argument in the function. Why do we put in the linear equation as the argument? Is there a good way to think about/interpret putting in that wx+b other than “just a way for us to have multiple parameters to tune”?
It’s not just wx + b. The logistic f_wb is that linear relationship with the sigmoid applied.
This combination has some nice properties when we obtain the gradients (the partial derivatives) of the logistic cost function.
Hi @Michael_Khachatrian Great question! Yes, there’s more to the linear equation (wx + b) in logistic regression than just providing multiple parameters to tune. The linear equation helps capture the relationship between input features and the output probability. Here’s a more intuitive way to think about it:

Linear combination of features: The term “wx” represents the weighted sum of input features (x), where “w” are the corresponding weights. Each weight quantifies the importance or contribution of the corresponding feature in determining the output. This linear combination helps to model the relationship between the input features and the output. A positive weight suggests a positive relationship between the feature and the output, while a negative weight indicates a negative relationship.

Bias term: The “b” term, also known as the bias, acts as a threshold or an offset. It helps adjust the output of the linear combination of features. It can be thought of as a baseline value that shifts the decision boundary, allowing the model to adapt to different data distributions.

Nonlinear transformation: The logistic function (also known as the sigmoid function) is applied to the linear equation (wx + b) to transform the result into a probability value between 0 and 1. The logistic function squashes the input into the range (0, 1), which is ideal for binary classification tasks.
To sum up, the linear equation (wx + b) in logistic regression captures the relationship between input features and the output probability, while the logistic function ensures that the output remains within the desired range (0, 1). The linear equation allows the model to learn complex relationships between the input features and the output, which is crucial for classification tasks.
I hope this helps!!
Thank you for the answer. I guess my confusion is more about why we apply the sigmoid transformation to the linear equation specifically.
Two takeaways seems to be the usefulness of the sigmoid for binary classification due to its shape, and I guess the properties of the derivative (I worked out the partial derivative of the function myself and see what you mean).
So did somebody one day just take a look at the sigmoid function and think “hey that looks like it could be useful for binary classification due to it’s shape, and let me add in the linear equation component to make sure the correlation of the features to the output has an influence on the model” and come up with logistic regression? (I’m probably oversimplifying a lot)
Thank you for this answer this makes sense. So if I’m understanding right, a way to think about it is that the wx+b part captures the relationship between features and output (since linear regression is a good way of measuring correlation), and the sigmoid transformation is for making the model more useful for classification purposes by putting the values between 0 and 1.
Logistic regression evolved from the “perceptron”, whose origin goes back to the 1940’s.
From the Wikipedia article:
If you substitute the logistic sigmoid for this binary mapping, you get a continuous output, whose gradients are well behaved. This is very handy for solutions that use the gradients, giving us logistic regression.
Yes!!! Exactly. Those are the two main concepts you need to understand the logic behind logistic regression