Is logistic regression is simply scaling linear regression output into 0 and 1?

From the lectures i have understood that for logistic regression, we are simply taking the linear regression output and feeding it to sigmoid function, which in return throws output between 0 and 1.
If that is the case why can’t we use feature scaling techniques to do the same.

Feature scaling would be an operation performed on x before it is linearly transformed by W and b. The sigmoid function ensures that no matter what values are in x, W, or b, the activation value of sigmoid(Wx+b) will always be between 0 and 1. No feature scaling without a sigmoid function at the end would be able to guarantee an activation value between 0 and 1.

1 Like

Thanks @bs80
But what i am trying to understand, feature scaling on linear regression output.
We can simply apply max scaling on linear regression output which will scale down/up the linear regression output between 0 and 1.
If we can achieve same effect with max scaling then why sigmoid function ?

I can’t claim to be an expert on this, but I’m guessing that there isn’t really a sure way to know what the minimum and maximum possible value for Wx+b can be. In theory, there is nothing that controls the values of x, W, and b such that Wx+b is confined to any specific range, so there is no way to determine a min and a max in order to scale the value to the interval between 0 and 1.

1 Like

A couple of thoughts:

  1. In linear regression, typically we don’t use feature scaling on the output.

  2. The fundamental difference between linear regression and logistic regression is how the cost functions are designed.

  • The cost function for linear regression aims to create a model of the output data.

  • The cost function for logistic regression aims to create a boundary that separates the data into “true” and “false” regions.

1 Like

Suppose we have a training dataset with two features X= [(-1,-100); (1,100)]. In this case, the first feature is x_1=(-1, 1), while the second feature is x_2= (-100, 100). The difference of the ranges of these features is too large, so we can apply feature scaling to x_2 and change its range to x_2’= (-1,1) . If we treat feature scaling as a function, the input of it is an array of one feature x_2, and the output of it is an variance of the input array x_2’.

For the sigmoid in logistic regression, the input should be a scalar computed by a sample and the parameters, such as w1x_11+w2x_12+b, where (x_11, x_12) is the first sample in the dataset. and the output is a scalar between 0 to 1.

Therefore, the input of feature scaling should be an array (a feature), while the input of sigmoid function is a scalar (the dot product of a sample and parameters). They are two different functions and have different meanings in reality.

1 Like