Steepest descent and its visualization - is it always optimal?

I am in Machine Learning Specialization Course 1 Supervised Machine Learning: Regression and Classification. In week 3 Optional lab: Gradient descent for logistic regression, I came across this logistic regression gradient descent.

I have one question about Gradient Descent and the path of steepest descent. When I chose w,b in the bottom right corner, instead of going straight to the lowest cost, it first went towards top left and then towards top right. I wanted to understand if this is the most optimal path gradient descent can take or is there some tradeoff? Visually, it looks like it can converge much faster if we follow the red arrow I have created below. Is there some resource on statistics/gradient descent which will help me understand more about it?

The scaling of the two axes of the graph are not the same. To visualize the gradients, you’d have to force the plot to use the same scaling on both axes.