I’m almost at the end of the MLS course, and I still don’t understand how to choose number of units per hidden layer.
I recall earlier in the course there was a slide that I think touched on this, i.e. experiment with Neural Networks to figure out the number of hidden layers - I assume this also means number of units per each hidden layer?
If so, the number of permutations (varying number of units AND number of networks) can be mind-boggling. Is ther some logical approach to determine the number of units per layer?
For instance, in this week’s lab we have units with 256, then 128 units in two first hidden layers. We are dealing with hundreds of movies and hundreds of users. Why 256 and 128? If I were to look experimentally for # of units and layers, where would I even start?
thank you
It’s by experiment. Here are some of my personal rules of thumb, this should be good enough to get you started to develop your own intuition:
1- Start with one hidden layer.
2- Start with sqrt(n) hidden layer units in that hidden layer (n is the number in input features). It’s a relatively safe starting point.
3- Train (initially without regularization), and evaluate on a test set.
4a- If you get good enough test results, cut the number of hidden layer units in half and go back to step 3.
4b- If you don’t get good enough results, double the number of hidden layer units and go back to step 3.
5- After you have tried several different sizes of hidden layer units, and if you don’t get good enough results, then consider adding a second hidden layer. Go back to Step 2.
6 - Once you get good enough results, add some regularization and go back to Step 3.
7 - Continue iterating steps 2 through 6 until you get good enough results.
The key point to consider is that you don’t need the optimum perfect solution - you just need one that is good enough to solve the problem you’re working on.
After you get some experience, you can ignore my rules of thumb and create your own.
This is really good, solid advice - thank you! I copied and saved this post
Just a thought - I think it’d be really useful to add this to the lecture where this is discussed
Everyone in machine learning has their own rules of thumb, based on their experience, the tools they use, and the types of problems they work on.
I would not presume to add mine into the course materials.
I understand. My point was that it’d be useful to have some rules of thumb of where to start wrt numbers of layers and units in the NN lectures. Best practices perhaps. Because this is a frequently asked question (based on what I’ve seen on the forums here) and I found that a cursory mention of this in the lectures was not enough for newbies like me.
Just a suggestion on how to improve this course - we are asked to do this