Deep Learning for Hyper Parameters


I’ve just finished the video for Exp Weighted averages and we introduced yet another hyperparameter. I was thinking so far we have learning rate, lamda, probably mini batch size, number of iterations, and more will come.

Is there a thing as applying deep learning to the choice of hyper parameters themselves? So our gradient descent would be to figure out the best possible combination. I think this is not possible because every calculation of Z would imply running deep learning for the underlying problem, potentially not possible to vectorize.

Feels impractical due to resource limitations. Aside from that would this ever make sense?

Thank you.

It’s an interesting idea, but I have the same reaction you describe: it would be cool, but it just seems too complicated. The high level problem is that the search space is just too large. Maybe some day we’ll have the compute power to solve this as a DL problem, but for now it sounds like we have to listen to Prof Ng and cultivate our intuitions about which directions are the most useful to explore.