Overfitting or Underfitting

Is it possible to overfit a dataset with a small network? I have a dataset of 3600 figures with 6 cases and I’m training it with an 8 layer network. Should I increase the size of the network to reduce the bias first?

1 Like

Hi @Chiang_Yuhan

It’s often a good practice to start with a simpler model and gradually increase its complexity if necessary. This allows you to find a balance between underfitting and overfitting. If your current 8-layer network is overfitting, consider reducing its complexity and adding regularization techniques before increasing its size.


Would elaborate what is Underfitting and Overfitting?

Hi @Wajid_Ali2,

Underfitting would be something like, you trained a model with 1000 images of a cat, but the dataset includes only front facing images of a cat. So when you ask it to make a prediction on a new image (never seen by the model before) of a cat in which the cat is facing sideways, the model fails to predict it as a cat. You can say, the model is not very generic when it comes to predicting cats.

Overfitting would be something, like, you trained a model with 1000 images of a cat, but you only had images of black cats. So when you ask it to make a prediction on a new image (never seen by the model before) of a cat in which the cat is of brown colour, the model fails to predict it as a cat. You can say, the model has associated that for something to be a cat, it has to be of black colour (and ignored all the features which makes a cat a cat)

I hope these examples helped.



I think there is a simpler way to state the definitions of under and over fitting:

Underfitting (which Prof Ng also calls ‘high bias’) means that the model does a poor job of accuracy even on the training set. That means the model is not powerful enough for the task at hand.

Overfitting (also called ‘high variance’) means that the model gives good accuracy on the training data, but does relatively poorly on any other data, e.g. validation or test data. There can be lots of reasons for that.

Prof Ng discusses these issues and how to deal with them in great detail in DLS Course 2. If you haven’t taken DLS yet, it is a good next step to understand these issues.


What make difference between them?
If they both are not able to recognize accurate image or they aren’t able to predict new image.

It’s mean in the case of Underfitting model neither predict accurately training data (cat image) and nor on test data
In Overfitting model give accuracy on training data but on new data it will not give accuracy

1 Like

From my knowledge from the DLS course, this training graph seems to be overfitting instead of underfitting right?

I have taken the course before! By Prof Ng’s suggestion I think my next step should be building a bigger network to reduce bias first right? Since this is a relatively shallow network

Yes, that seems like a good next step. The training accuracy is only roughly 80%, so that’s probably not good enough. So the first step is to solve that problem and the first thing to try would be a more complex network. You also have the issue that the validation accuracy is worse than the training accuracy, but you’ll have to see what happens once you can come up with a network that will satisfy your requirements for training accuracy. Trying to solve the overfitting problem with training accuracy at 80% would probably end up being a waste of time. The first job is to get good training accuracy and then go from there.

I’m also curious to know more about your dataset. You said:

I’m not sure what you mean by 3600 figures with 6 cases. Do you mean 3600 input data samples, each of which has 6 features? Just wanted to make sure I was clear on what you mean.

1 Like

Thank you sir. Here is a brief description about my network
The vertical axis is the magnitude of the Fourier Transform Signal. The Horizontal axis is the frequency all until the nyquist frequency.
I want my model to conduct a classification task with 6 classes (all classes has almost the same amount of examples)
The image size or feature size that I input in the 8 layer network is (160 x 160 x 3) or 76800 in total. My neural network has roughly 3500 trainable parameters.

1 Like

The goal is to train a network that accounts for the relevant salient features of a training set (underfitting is when this is not satisfied) while not also fitting the noise and idiosyncrasies in the training set (overfitting is when this is not satisfied).
Your plot of accuracy vs epoch shows likely overfitting. The accuracy for the validation set has leveled off, but the accuracy for the training set continues to climb.
This is a warning that your model is too complex; the metaphorical smoking gun is when accuracy for the training set increases (or loss decreases) with epoch but accuracy for the validation set decreases (or loss increases).

How many of your layers are convolutional? (You may have too many.) How many filters, and what size, are they in each convolutional layer? (You may have too many in one or more of the convolutional layers; the filters may be too large in one or more of the layers.) Do you have more than one fully connected hidden layer? (One is often enough.) How many neurons in the hidden layer? (You may have too many.)

Look at the number of parameters in each layer. Pareto analysis suggests looking at the layer with the most parameters as a place to consider pruning.

1 Like