Hi,

I’ve just finished the video for Exp Weighted averages and we introduced yet another hyperparameter. I was thinking so far we have learning rate, lamda, probably mini batch size, number of iterations, and more will come.

Is there a thing as applying deep learning to the choice of hyper parameters themselves? So our gradient descent would be to figure out the best possible combination. I think this is not possible because every calculation of Z would imply running deep learning for the underlying problem, potentially not possible to vectorize.

Feels impractical due to resource limitations. Aside from that would this ever make sense?

Thank you.