Feature Scaling Part 1: optimizing number of elements in dw

AKazak · November 24, 2024, 4:19pm

Greetings!

While watching the lecture I had a question: if in case of linear regression we were managed to make cost function of as a circle by using feature re-scaling techniques, then is it still worth using different values in dw vector?

Does it make sense to equate all the gradients to make a straight-line trajectory of a gradient descent (see and image below)?

TMosh · November 24, 2024, 5:12pm

No, it does not.

Regardless of the scaling of the features, each one may have a different significance in predicting the outputs. So each one needs its own weight.

Topic		Replies	Views
Feature Scaling: Why not use separate learning rates instead of rescaling features? Supervised ML: Regression and Classification week-module-2	1	391	August 5, 2023
Week 2 Lab 3 Question About Feature Scaling Supervised ML: Regression and Classification week-module-2	1	302	January 8, 2024
Feature Scaling Supervised ML: Regression and Classification week-module-2	6	574	July 3, 2023
Graph in optional lab : feature scaling and learning rate Supervised ML: Regression and Classification week-module-2	2	517	March 3, 2023
About gradient descent and Features scaling Supervised ML: Regression and Classification week-module-2	6	575	August 19, 2022

Feature Scaling Part 1: optimizing number of elements in dw

Related topics