From the lectures i have understood that for logistic regression, we are simply taking the linear regression output and feeding it to sigmoid function, which in return throws output between 0 and 1.
If that is the case why can’t we use feature scaling techniques to do the same.
Feature scaling would be an operation performed on x before it is linearly transformed by W and b. The sigmoid function ensures that no matter what values are in x, W, or b, the activation value of sigmoid(Wx+b) will always be between 0 and 1. No feature scaling without a sigmoid function at the end would be able to guarantee an activation value between 0 and 1.
Thanks @bs80
But what i am trying to understand, feature scaling on linear regression output.
We can simply apply max scaling on linear regression output which will scale down/up the linear regression output between 0 and 1.
If we can achieve same effect with max scaling then why sigmoid function ?
I can’t claim to be an expert on this, but I’m guessing that there isn’t really a sure way to know what the minimum and maximum possible value for Wx+b can be. In theory, there is nothing that controls the values of x, W, and b such that Wx+b is confined to any specific range, so there is no way to determine a min and a max in order to scale the value to the interval between 0 and 1.
A couple of thoughts:

In linear regression, typically we don’t use feature scaling on the output.

The fundamental difference between linear regression and logistic regression is how the cost functions are designed.

The cost function for linear regression aims to create a model of the output data.

The cost function for logistic regression aims to create a boundary that separates the data into “true” and “false” regions.
Suppose we have a training dataset with two features X= [(1,100); (1,100)]. In this case, the first feature is x_1=(1, 1), while the second feature is x_2= (100, 100). The difference of the ranges of these features is too large, so we can apply feature scaling to x_2 and change its range to x_2’= (1,1) . If we treat feature scaling as a function, the input of it is an array of one feature x_2, and the output of it is an variance of the input array x_2’.
For the sigmoid in logistic regression, the input should be a scalar computed by a sample and the parameters, such as w1x_11+w2x_12+b, where (x_11, x_12) is the first sample in the dataset. and the output is a scalar between 0 to 1.
Therefore, the input of feature scaling should be an array (a feature), while the input of sigmoid function is a scalar (the dot product of a sample and parameters). They are two different functions and have different meanings in reality.