Why does Complex Neural Network perform worse than simple neural Network in C2_W3 lab?

As per the lectures, Andrew mentioned that Neural Networks are generally low bias machines and increasing the complexity of a neural network will almost always improve the performance of the Neural Network.

However, in the programming assignment (lab) for Week 3, we find that for a High Variance Neural Network, a simple Neural network with the following description:

  • Dense layer with 6 units, relu activation
  • Dense layer with 6 units and a linear activation
    outperforms a complex Neural Network with the following description:
  • Dense layer with 120 units, relu activation
  • Dense layer with 40 units, relu activation
  • Dense layer with 6 units and a linear activation (not softmax)

Please note that the “complex” neural network here is the un-regularized version. Still, I feel that since it is a Neural Network, it must perform well when it’s complex rather than simple.

I’m having trouble understanding this. Please help me understand the result.

Thanks
Aditya

1 Like

Can you give a reference for where he said this?

“low bias” and “high variance” are essentially the same thing.

Sure, in lecture named “Bias/variance and neural networks” on the timestamp 1:41-1:55.
Another thing to mention, that I missed mentioning above is that his exact statement was,
“Large neural Networks, when trained on small term moderate sized datasets are low bias machines.”

Since, low bias and high variance are essentially the same thing, and we know that to solve high variance, we can collect more data, so I don’t think that when the dataset is huge the validity of the above statement should change.

2 Likes

I do not think Andrew really said “improving the complexity of a neural network will almost always improve the performance of the Neural Network.” There must be a context. If dataset is huge, improving the complexity of a neural network might give you better performance. It won’t be always true if you only have a small dataset.

In general, if the dataset is too small, and the number of parameter to train is huge, e.g. to train a complex neural network, or any other machine learning models, the model won’t be trained enough to give you a good performance. We will most likely get an over-fitting model.

But to your specific assignment, what are the metrics did you use to compare the model performance? How big is the dataset? What is the “complex neural network” did you compare with?

1 Like

In my case for the assignment, the X dataset is of (800x2) dimension, which is split into training set (400x2) , Cross Validation Set (320x2) and Test Set (80x2)

The “Complex Neural Network” has the following structure, with a total 5446 parameters:

  • Dense layer with 120 units, relu activation
  • Dense layer with 40 units, relu activation
  • Dense layer with 6 units and a linear activation (not softmax)

The “Simple Neural Network” has the following structure, with a total of 60 parameters:

  • Dense layer with 6 units, relu activation
  • Dense layer with 6 units and a linear activation
1 Like

Just a comment. In the “complex NN”, the first layer takes two input features, and turns it into 120 activations.

That’s really not realistic in practice. There is likely not enough information in the two input features to create 120 different activations.

I’d characterize that as an “overly-complex NN”. Most of those 1st layer neurons are not providing any benefit.

It’s going to be very difficult to train 5,000 parameters using only two input features.

So this isn’t a very useful example, and I would not spend much time worrying about how to compare it to the lectures.

1 Like

Alright, makes sense now.
Thanks!