hello
when training large NN ,trying many hyperparameters take large time
does it good idea to take random sample and train on it then select best bunch of parameters then retrain on whole data
Yes, Prof Ng spends quite a bit of time discussing how to choose hyperparameters in DLS Course 2. Just checking my notes, he makes several recommendations:
Rather than using a “grid” system and painstakingly enumerating all the positions, he suggests taking random selections from it as your experimental process. For the purpose of hyperparameter tuning, it’s also better to select a subset of your data, if you have a large amount of data.
Then go from “coarse” to “fine” as you “home in” on the combinations that seem to be working better.
He also discusses how select the appropriate “scale” when doing the coarse to fine process: for some types of parameters you can do a binary search on a linear scale, but for some it’s better to use a log scale.
If you took DLS Course 2, it might be worth going back and listening to the relevant lectures again. The relevant section is three lectures right at the beginning of Week 3 starting with this one titled “Tuning Process”. Or if DLS Course 2 is new to you, you can just subscribe in “audit” mode for free and listen to those lectures. There is a lot of other interesting and worthwhile material there as well.