In the course Dr Ng shows a slide where he recommends choosing the Learning Rate (alpha) starting with 0.001 and going UP by 3x each time to find a good rate.
However in the Optional Lab C1_W2_Lab03 - the first learning rate chosen is
9.9e-0.7. Perhaps this was for demonstration of a bad learning rate.
But the next two choices, both of which worked fine were:
Alpha = 9e-0.7 & 1e-0.7
Both of these are way smaller than 0.001 (suggested in the notes). So if I had followed the class notes and started with 0.001 then I would need to go down (not up) and by factor of 10 each time till I reached the “good” alpha of 1e-0.7.
So my question is: how does one pick a good starting learning rate in general? and how/why was the choice of 9.9e-07 selected?
Never mind… I think I figured it out as I completed the rest of the lab.
What I took away is that if we apply scaling to normalize the data set so that it is 1. around zero and 2. of similar scale then we don’t have to worry about picking extreme learning rates… in this case we can choose learning rates like what Dr Ng showed in the slides e.g. starting with 0.001, and going up by 3x. I tried this on the normalized set and this approach matches the slides.
Do let me know if you agree this is the right “take away” for the lab.
Thanks in advance, TMosh!