I have a conceptual question about gradients in relation to feature scaling, related to the image below:
As I learned in my multivariable calc class at college, gradients will always be perpendicular to level set (contours). I was wondering how it would be possible for the first part of the image to then oscillate between contours, since the gradient should always be pointing perpendicular? How would that be attributed to a lack of feature scaling (assuming that the learning rate is not too large)?