Hyperparameter Tuning

How can we know how important a certain hyperparameter is compared to the other?
Prof.Andrew stated that tuning the learning rate (alpha) is more important than tuning the number of hidden layers. How did he figure that out?

Hey @Lina_Hourieh,
Well, that’s a nice question. I guess one way to look at this is to find out how much each of the hyper-parameters can influence your model’s outputs, and assign the relative importance to the hyper-parameters in accordance. For instance, certain hyper-parameters like the number of neurons in each layer, number of layers, activation function in each layer, etc often affect the model’s outputs to a relatively small extent, as compared to other factors like learning rate, which may affect the model’s outputs to a larger extent, and hence, we focus more on hyper-tuning learning rate. Now, this relative importance, you have to find out after performing lots and lots of experiments.

Moreover, with more and more practical results, one often tend to give less importance to certain hyper-parameters and more to others. For instance, after you will train a lot of neural networks by yourself, you will get some experience regarding the number of layers for a neural network as per the task, and you might start your model with good initial values. Now, in some cases, you might need to increase and/or decrease the number of layers, and that’s perfectly acceptable, but in most of the cases, you will find that the initial number of layers you started with are quite acceptable.

However, I am assuming that when we are comparing 2 hyper-parameters for their importance, we are trying to assign as good of initial estimates as possible. For instance, if someone starts with a neural network of 1 layer, and then without any hyper-tuning the number of layers, only hyper-tunes the learning rate, and then expects the results to be good, well, the person is in for bad luck!

To conclude, as your experience grows, you will get to learn good initial estimates of certain hyper-parameters, and thus, you can easily focus on the rest, and keep these certain hyper-parameters on the back seat.

It’s more like you being a neural network, your experiments being the input to the model, and you are learning the weights, which are analogous to initial estimates of certain hyper-parameters, and as the iterations (experience) increase, you learn to produce better outputs (better neural networks with less hyper-tuning). I hope this helps.


1 Like

Very satisfying answer :+1:…Thank you