For the feature scaling lecture, the graphs (ex; size vs price; age vs price) seemed to lose their linear relationship after feature scaling (I’m having difficulty reconciling this with the fact that feature scaling does lead to gradient descent converging quicker).
Why do the above graphs seem to lose their linear relationship? (if you plot a best-fit line, it seems the best-fit line would better approximate the training data if the input variables were not rescaled)