If the learning rate is too high, the result can be that you fail to converge on a local minimum.

What’s the problem with a learning rate that is too low? Does it just result in more calculations than necessary because you’re taking such tiny steps downhill each time?

How do you decide what learning rate to start with?

Your observation is right, in the videos of that course Prof Andrew gives advices on how to choose the learning rate and keep testing some values progressively.

There are also certain techniques such as grid search, random search, bayesian search that could automate a searching process for hyperparameters. Search about them in google.

Ok Thanks. I see that there is a video about choosing the learning rate in Week 2. (4th video).