Appropriate scale to pick hyperparameter week 3

Anbu · July 24, 2021, 7:14am

Hi Sir,

@paulinpaloalto @bahadir @eruzanski @Carina @neurogeek @lucapug @javier @kampamocha

Regarding the lecture Using an Appropriate Scale to pick Hyperparameters, we had couple of doubts. can u please help to clarify ? Kindly please me.

A 7.02 minute, what does it means if beta goes from 0.999 to 0.9995 ? Why it goes from 0.999 to 0.9995 . Is this due to what ?
Im not getting why linear scale bad idea in the case of beta used to compute exponentially weighted average ? can u please help to clarify ?

paulinpaloalto · July 24, 2021, 2:15pm

For 1), it is just because Prof Ng is changing it to show you what happens. That’s the point with hyperparameters, right? You change them and see what happens. The way to know what to do is to run experiments and then understand the results.

For item 2), I think he does quite a clear job of explaining this in the lectures. The point is that if the quantity you are experimenting with or “sampling” is fundamentally exponential or logarithmic, then using a linear scale to select the possible choices gives you bad results because you don’t get enough choices that actually explore the interesting parts of the spectrum. In the exponential case here, just varying by 0.1 across the range misses exactly the point he is making in case 1) above. I suggest you watch the lecture again with what I said in mind. If my memory serves, I’m pretty sure he actually says his version of exactly what I just said above.

Anbu · August 4, 2021, 10:24am

@paulinpaloalto Thanks sir for the reply. Here is my understanding about second point (linear scale bad idea for exponentially weighted average) and proff statement that this whole sampling process does, is it causes you to sample more densely in the region of when beta is close to 1.

My Intuition or understanding : If we do sample uniformly over linear scale, more beta values are allocated in the region close to 1. So we average over more no of days and always end up with smooth update which does not essential all time for the problem. Am i right sir ?

paulinpaloalto · August 8, 2021, 1:07am

It’s possible that this is just a language problem, but I believe it’s exactly the opposite of that. The problem is that if you sample linearly, then you don’t get very good coverage of the region that is close to 1. You just get 0.8 and 0.9, right? What about the range between 0.9 and 1? That’s the point and I think Prof Ng does a fine job of explaining that in the lectures.

Topic		Replies	Views
Week 3 Uniform Sampling Improving Deep Neural Networks: Hyperparameter tun	2	627	July 12, 2021
Why following hyparameter tuning function is incorrect? Improving Deep Neural Networks: Hyperparameter tun week-3	4	27	January 30, 2025
Sampling hyperparameter for momentum Improving Deep Neural Networks: Hyperparameter tun quiz-help , week-3	8	285	February 2, 2025
Course2 Week3 - Confusion about the example given for linear scale Improving Deep Neural Networks: Hyperparameter tun	4	617	July 12, 2021
Problem with large beta Improving Deep Neural Networks: Hyperparameter tun	1	498	May 4, 2022

Appropriate scale to pick hyperparameter week 3

Related topics