As far as I understood, feature scaling is not necessary in logistic regression, since the input paremeters are always either 0 or 1. However, Andrew commented on the video about gradient descent for logistic regression some basic concepts that are the same as linear regression, and among them there was feature scaling. So, did I get it wrong? If so, why would feature scaling be necessary when using logistic regression?

Welcome to this community. This is your first post

Regarding your question “Why is feature scaling needed in logistic regression?”

Logistic regression uses the sigmoid function to produce the output of the model.

This function is very sensitive to the scale of the features because it will map its inputs from 0 to 1.

If the input features have diverse ranges or very high and low values, these very high or very low values will have an impact in the resulting probability. Very high values will be resolved around 1, while very big low (negative) values will resolve near 0, and features within, say, 0 and 1, will probably revolve around 0.5. This may give more weight in the probability to features that are really not that important.

When all features are scaled to similar ranges, the sigmoid function will treat all features with similar ‘importance’ and the model’s performance will improve.