Optional Lab: Feature scaling and Learning Rate (Multi-variable)

This text tells gradient descent with š›¼ = 9e-7 runs faster than gradient descent with
š›¼ = 1e-7. But I have tried both gradient descents with a number of iterations equal to 20.

I turns out that the second example ,with 20 iterations , almost reaches the minimum, meanwhile
the fist example with the same number of iteration is still far from the minimum which contradicts the text in the first image.

We need to run more iterations to see which one wins at last.