In week 3, video 2, titled as *Using an Appropriate Scale to pick Hyperparameters*, what does this sentence in the screenshot attached mean?

I think that if you continue watching the video and listen carefully the examples given by Andrew, you could understand what he means by that sentence. However, and knowing that I cannot be as clear as him, I’ll try to explain it in a simple way.

Sampling uniformly at random means that any possible value within the range you are picking from is equally likely to be selected, take the example on the video for the number of layers between 2 and 4, it means that any value (2, 3 or 4) has the same probability (1/3 in this case) to be selected, therefore the probability distribution of the values for the number of layers is uniform.

For other hyperparameters, such as the learning rate, Andrew mentions the possibility to take a different approach, where the probability in the range of that hyperparameter, say 0.0001 to 1.0, is not uniform, having the effect that some ranges between that scale have more probability than others of being selected for exploration.

And you can check that doing this little experiment, generating 100 values following the strategy proposed by Andrew in the video, and plotting the distribution of the observed values.

As you can see by plotting the results, you have more exploration of values on the range 0-0.1 than others, therefore the sampling is still random but not uniformly random.