I noticed that the layers in the lectures and labs are generally decreasing in size. I understand that for most projects the input layer would probably have more nodes than the output layer, but is it a good rule of thumb for the middle layers to decrease? … or do NN’s often have middle layers that increase before decreasing?
Another related question is how many layers should be used for a given project, but I’m thinking that may be covered in another course? Or maybe we add layers until bias goes away?
Hey @evoalg,
I will try to answer this in terms of standard neural networks, and not in terms of Convolutional Neural Networks (CNNs), since they haven’t been discussed in this specialization. Also, they have another attribute, i.e., the number of channels associated with them, so, let’s skip them for now.
As you said, in most of the applications, the number of inputs is larger than the number of outputs, so naturally, at some point in your neural network you have to start decreasing the number of nodes to meet the required number of output nodes. However, it’s no rule of thumb that the middle layers should always decrease consistently.
As long as your network meet the number of required input and output nodes, you are free to choose as many nodes in the middle layers as you want. That being said, empirically, neural networks that tend to decrease the number of nodes consistently are often found to perform better. Perhaps, one intuition that I might be able to give you here is that when you set the number of nodes to a large number, your model will have larger weight matrices, and hence, will be prone to over-fitting, and thus will not perform to it’s full potential. So, perform some experiments with differing architectures and you will get to know for yourself
This depends completely on your dataset and the required performance. As you pointed out, trying to increase the number of layers until the bias becomes too low, is one of the ways to go, but make sure you keep the variance in check as well, because as Prof Andrew mentioned in the videos, too large networks on too small datasets often tend to overfit. So, once again, we are riding on the see-saw between bias and variance