In practice, how many samples do you usually take with random search? (I’m defining sample as a unique combo of hyperparameter values) For example, is 10 enough or 100 or 1000 samples more reasonable? And what exact methodology do you use to implement the course to fine scheme? Do you just look the best performing combo, then readjust the ranges from which you’re sampling to be more tightly around the best performing combo values?

With random search depending always on network complexity and computational power avaliable you can set accordingly the number of random times to search. If for example a neural network is large, processing a lot of data it might be reasonable to go for 10 searches or even lower. You could possibly if you have computational capabilities perform more random searches as a second iteration around a given choice of parameters. The choice of the area may not be straightfoward to focus on a certain area and depends on hyperparameter model,s optimas itself.

So basically as many searches as you can reasonably do given computational constraints. Follow up question - do you randomly sample with or without replacement? I don’t even know if it’s possible to do without replacement using numpy and only a range to sample from, but I had a pretty small range for one of my hyperparams (num_layers, range=(2,8), so if I search 10 times, it’s impossible to do without replacement.

Random search is not the best approach, there are others more efficient such as bayesian , grid search and others using more efficient algorithmic approaches rather than just random.

The reason I am asking about random is because Andrew teaches in this course that random search is much more efficient than grid search. Random search is an O(n) runtime while grid search is O(n^c) where c is the number of hyperparameters being searched. He does not mention bayesian search methods.

My thought was if you already have a certain area maybe grid search would be more effective but he is the Prof here.