As per the lectures, Andrew mentioned that Neural Networks are generally low bias machines and increasing the complexity of a neural network will almost always improve the performance of the Neural Network.
However, in the programming assignment (lab) for Week 3, we find that for a High Variance Neural Network, a simple Neural network with the following description:
Dense layer with 6 units, relu activation
Dense layer with 6 units and a linear activation
outperforms a complex Neural Network with the following description:
Dense layer with 120 units, relu activation
Dense layer with 40 units, relu activation
Dense layer with 6 units and a linear activation (not softmax)
Please note that the “complex” neural network here is the un-regularized version. Still, I feel that since it is a Neural Network, it must perform well when it’s complex rather than simple.
I’m having trouble understanding this. Please help me understand the result.
Sure, in lecture named “Bias/variance and neural networks” on the timestamp 1:41-1:55.
Another thing to mention, that I missed mentioning above is that his exact statement was,
“Large neural Networks, when trained on small term moderate sized datasets are low bias machines.”
Since, low bias and high variance are essentially the same thing, and we know that to solve high variance, we can collect more data, so I don’t think that when the dataset is huge the validity of the above statement should change.
I do not think Andrew really said “improving the complexity of a neural network will almost always improve the performance of the Neural Network.” There must be a context. If dataset is huge, improving the complexity of a neural network might give you better performance. It won’t be always true if you only have a small dataset.
In general, if the dataset is too small, and the number of parameter to train is huge, e.g. to train a complex neural network, or any other machine learning models, the model won’t be trained enough to give you a good performance. We will most likely get an over-fitting model.
But to your specific assignment, what are the metrics did you use to compare the model performance? How big is the dataset? What is the “complex neural network” did you compare with?
In my case for the assignment, the X dataset is of (800x2) dimension, which is split into training set (400x2) , Cross Validation Set (320x2) and Test Set (80x2)
The “Complex Neural Network” has the following structure, with a total 5446 parameters:
Dense layer with 120 units, relu activation
Dense layer with 40 units, relu activation
Dense layer with 6 units and a linear activation (not softmax)
The “Simple Neural Network” has the following structure, with a total of 60 parameters: