In the last video before the quiz, “Deep Neural Network”, Laurence makes the comment about choosing the optimal learning rate: "In this case it looks to be about two notches to the left of 10 to the minus 5.
So I’ll say it’s 8 times 10 to the minus 6, or thereabouts. "
How does one get from 10**-5 to 8*10**-6? Where does this offset come from?
As you might’ve observed in the figure, 10^{-5} is the point beyond which the loss gets unstable.
8*10^{-8} lies well inside the low loss region that’s less than 10^{-5} and loss is relatively stable. There is no rule on what value to pick. The goal is to select a learning rate based on the graph. You could try with rates like 9*10^{-6} as well.
Thank-you for clarifying.