As per the lectures, Andrew mentioned that Neural Networks are generally low bias machines and increasing the complexity of a neural network will almost always improve the performance of the Neural Network.

However, in the programming assignment (lab) for Week 3, we find that for a High Variance Neural Network, a simple Neural network with the following description:

Dense layer with 6 units, relu activation

Dense layer with 6 units and a linear activation
outperforms a complex Neural Network with the following description:

Dense layer with 120 units, relu activation

Dense layer with 40 units, relu activation

Dense layer with 6 units and a linear activation (not softmax)

Please note that the â€ścomplexâ€ť neural network here is the un-regularized version. Still, I feel that since it is a Neural Network, it must perform well when itâ€™s complex rather than simple.

Iâ€™m having trouble understanding this. Please help me understand the result.

Sure, in lecture named â€śBias/variance and neural networksâ€ť on the timestamp 1:41-1:55.
Another thing to mention, that I missed mentioning above is that his exact statement was,
â€śLarge neural Networks, when trained on small term moderate sized datasets are low bias machines.â€ť

Since, low bias and high variance are essentially the same thing, and we know that to solve high variance, we can collect more data, so I donâ€™t think that when the dataset is huge the validity of the above statement should change.

I do not think Andrew really said â€śimproving the complexity of a neural network will almost always improve the performance of the Neural Network.â€ť There must be a context. If dataset is huge, improving the complexity of a neural network might give you better performance. It wonâ€™t be always true if you only have a small dataset.

In general, if the dataset is too small, and the number of parameter to train is huge, e.g. to train a complex neural network, or any other machine learning models, the model wonâ€™t be trained enough to give you a good performance. We will most likely get an over-fitting model.

But to your specific assignment, what are the metrics did you use to compare the model performance? How big is the dataset? What is the â€ścomplex neural networkâ€ť did you compare with?

In my case for the assignment, the X dataset is of (800x2) dimension, which is split into training set (400x2) , Cross Validation Set (320x2) and Test Set (80x2)

The â€śComplex Neural Networkâ€ť has the following structure, with a total 5446 parameters:

Dense layer with 120 units, relu activation

Dense layer with 40 units, relu activation

Dense layer with 6 units and a linear activation (not softmax)

The â€śSimple Neural Networkâ€ť has the following structure, with a total of 60 parameters: