Why is limiting the range of hyperparameters during tuning yield better results?

During the discussion about hyperparamter optimization/tuning, the instructor says,

You might feel like specifying a large range of values so that you can explore all possible values. But in fact, you get better results by limiting your search to a small range of values.

Why is that?

Hi Dreddy,

I think this espcially concerns saving computation time for the optimization. If you choose a narrower range, which is in a reasonable range for the data and the model, the tuner will be faster in converging to an optimal solution.

I am not 100% sure, but I suspect it could also be a result of how Sagemaker does the tuning. I read it is a combination of Bayesian and random search. It could also be that a wide range could lead the algorithm to converge in a local optimum that it finds with an unusual combination of parameters. In that case, it would also be better to restrict the range to values, where you would normally observe these hyperparameters to fall to prevent this.

Hope this helps,
Nils

Hi @dreddy, great question!

To add to what @Nils explained, it is good practice after you coarse of an entire “wider” range of a hyperparameter (i.e.: the learning rate), to sample more densely into a smaller range focus.

For instance, test out more values of the learning rate between 0.01 and 0.1 than between 0.1 and 1.

Thanks!

In the Deep Learning Specialization, in Course 3, Prof. Ng explains it very well…