Greetings!
While watching the lecture I had a question: if in case of linear regression we were managed to make cost function of as a circle by using feature re-scaling techniques, then is it still worth using different values in dw vector?
Does it make sense to equate all the gradients to make a straight-line trajectory of a gradient descent (see and image below)?