How to decide on the number of the hidden units in hidden layers?
If you are asking about linear layer rather than CNN,
then that is the question of under complete vs overcomplete.
The general rule, you should pick the number to be the power of 2. For example, 64->128. It will be easier for the machine to run.
Under complete-fewer numbers of hidden units Encoder
When you are doing things like image classification, you would like to extract the feature of the image. For example, in digit classification with image size 28 by 28(28*28=784), you will input the image with a shape of 1 by 784 and outputs the result with 1 by hidden as the tensor’s shape.
Thus, you would like to decrease the number of hidden units to be less than 784 or even less than 500. That is because you want to extract the feature of the image, so you need to remove the unnecessary part, such as the background. Thus, you have to decrease the number of hidden units to be less than 500 because most of the pixels in the digit image are background, and you do not want to extract the background of the digit image for image classification. In this lecture, it used a hidden size of 30 for classification.
Over complete-more numbers of hidden units Decoder
When do we increase the number of hidden units? That is the case that we do not want to do some decoding tasks. For example, you want to increase the number of hidden units because you want to upsample the image or the input value. For example, you want to do image generation. Example: Autoencoders
Sometimes, they will have some algorithm to pick the number of hidden units for a set of numbers.
For CNN, if you do the encoding task(image classification), you will decrease the size of the image and increase the size of the channel. In the end, you will have an image with size 1 by 1, the channel length is high. That is because you extract the feature of the image.
Thank you for your response, @JonathanSum
In the case of the image classification example of under complete, I understand that hidden units should be lesser than 784, but does that mean I need to train my model starting from lesser units, say 16, and going up with the power of 2 e.g. 16->32->64->…->512. Or, is there a way to find a sweet spot?
Thank you in advance.
They generally go down. 784->512-> 256->…[number of classes] for classification.
In this case, it started to 30 rather than 500 or 512 because most part of the image is background.
You need to run an experiment to see what number is the best for the starting point.
Click me to see the algorithm
Sorry, I missed the link earlier. Thank you for the explanation, though.