Week 2 Lab 3 Question About Feature Scaling

Link to lab

I have a conceptual question about gradients in relation to feature scaling, related to the image below:

As I learned in my multivariable calc class at college, gradients will always be perpendicular to level set (contours). I was wondering how it would be possible for the first part of the image to then oscillate between contours, since the gradient should always be pointing perpendicular? How would that be attributed to a lack of feature scaling (assuming that the learning rate is not too large)?

That figure is just a sketch.

Andrew is drawing the worst-case situation where the learning rate is too high, and the changes in the weights causes the cost to overshoots the best trajectory.

He could not draw all those arrows perpendicular to the curves, because they would all be right on top of each other, and the sketch would be unreadable.