In the week 3 quiz, β (hyperparameter for momentum) is between 0.9 and 0.99, which is the recommended way to sample a value for beta? How did we arrive at the answer? Can someone explain the idea behind this?
Because of the role of that parameter in the formula, you use a search on a logarithmic scale. It’s been 3 or 4 years since I watched those lectures, but I’m pretty sure I remember Prof Ng specifically making that point about using a log based search. It’s worth another look/listen to the lectures.
I’d guess it’s by experimentation.