Appropriate scale to pick hyperparameter week 3

Hi Sir,

@paulinpaloalto @bahadir @eruzanski @Carina @neurogeek @lucapug @javier @kampamocha

Regarding the lecture Using an Appropriate Scale to pick Hyperparameters, we had couple of doubts. can u please help to clarify ? Kindly please me.

  1. A 7.02 minute, what does it means if beta goes from 0.999 to 0.9995 ? Why it goes from 0.999 to 0.9995 . Is this due to what ?

  2. Im not getting why linear scale bad idea in the case of beta used to compute exponentially weighted average ? can u please help to clarify ?

For 1), it is just because Prof Ng is changing it to show you what happens. That’s the point with hyperparameters, right? You change them and see what happens. The way to know what to do is to run experiments and then understand the results.

For item 2), I think he does quite a clear job of explaining this in the lectures. The point is that if the quantity you are experimenting with or “sampling” is fundamentally exponential or logarithmic, then using a linear scale to select the possible choices gives you bad results because you don’t get enough choices that actually explore the interesting parts of the spectrum. In the exponential case here, just varying by 0.1 across the range misses exactly the point he is making in case 1) above. I suggest you watch the lecture again with what I said in mind. If my memory serves, I’m pretty sure he actually says his version of exactly what I just said above.

1 Like

@paulinpaloalto Thanks sir for the reply. Here is my understanding about second point (linear scale bad idea for exponentially weighted average) and proff statement that this whole sampling process does, is it causes you to sample more densely in the region of when beta is close to 1.

My Intuition or understanding : If we do sample uniformly over linear scale, more beta values are allocated in the region close to 1. So we average over more no of days and always end up with smooth update which does not essential all time for the problem. Am i right sir ?

It’s possible that this is just a language problem, but I believe it’s exactly the opposite of that. The problem is that if you sample linearly, then you don’t get very good coverage of the region that is close to 1. You just get 0.8 and 0.9, right? What about the range between 0.9 and 1? That’s the point and I think Prof Ng does a fine job of explaining that in the lectures.