Why in this lab, it says that when learning rate causes the J cost function to jump across the global minimum (alpha = 9e-7), it is gonna converge quicker, than when the 3rd learning rate is used (alpha = 1e-7)? As when I look at the graph of J vs # of iterations, the J clearly goes directly to the global minimum using the 3rd learning rate value. After 10 iterations, J is much higher using alpha number 2 compared to when using alpha number 3.
I will be grateful for explanation.