Hello! In my recent project, I have encountered a high bias problem. The problem is a multi-class classification problem. I used two neural networks, one is a 4 layer shallow network and another is a 150 layer MobileNet V2 which I used transfer learning on. Both have achieved similar accuracy (0.7) with 100 epochs. I’ve learned from DLS that a larger network should reduce the bias, but it didn’t in my case. I would like to hear your thoughts on this matter!
Here are some things about my implementation:
I have 600 cases, 100 cases in 6 classes.
I used 80% for training and 20% for validation (I don’t need a test case currently)
batch size is 16 for the shallow and 32 for the transfer learning network
Each is trained for 100 epochs.
Here are some random visualization of my training.
Although the dataset is small, the similarity of each cases are similar.
Did you try running this model with split of 60% training and 40% validation as you mentioned you are not using test case currently?
what I believe or have learnt if dataset is small like in 100s, the better approach is 33%-33%-33%. but as you are not using test set, you could try 60%-40% and check if there is change in accuracy.
My first reaction is that 100 samples per class may be too small for the DNN to learn. Perhaps in the pre-trained one it could work a bit better, but still I think 100 samples is very low number. If you train for many batches, you risk overfitting. Any chance you can get more samples per class?
Thank you for this reply! I’m interested in the mathematical aspect your thoughts, could you elaborate or link some literatures for me to dig in? Much thanks.
It takes a lot of time to create samples but I am meddling with data augmentation to increase the set. What big of a set should I aim for if I use a simple 8 layer implementation? Or 20 layer implementation?
If you have taken DLS specialisation, Professor Ng explains about splitting data ratio. You can refer Course 2 Improving Deep Neural Networks 1st weeks’ videos.
Especially for a data too small, your split ratio will cause underfitting the data.
A general rule usually used or applied in algorithm is at least ten times as many rows (data points) as there are features (columns) in your dataset. This means that if your dataset has 10 columns (i.e., features), you should have at least 100 rows for optimal results.
I’ve taken the course! Are you talking about a single data/picture? Or are you talking about “all the training sets” ? If I’m correct, if my picture is (10x10) pixel, I should have 10 cases for training right? ( If I concatenate them row-wise)
Choosing 10x10 pixel for this analysis is only a questionable choice.
It’s just an analogy for understanding easily because I don’t know if I understood correctly.
When I was telling you about general rule in applying ten times to the data points, I meant in terms of row and columns.
Were you mentioned 10x10 pixel is an image description in which each pixel would have different grid value and again on that your dataset would depend.
You already mentioned in your post creating comment that your batch is shallow
You also mentioned this in your comment
These are contrasting statement.
Imagine you have 6 classes for 100 cases, so you are not giving much of the samples for your training model to train or find the features for better accuracy.
To have a better accuracy of your analysis , The amount of dataset is as important as having features which would check variation in your algorithm.
You can refer these videos from our community learning AI
In the same course, you also get bias and variance video. You can refer both.
Here are some of the pointers I learnt through MLS course. Probably it might help you
In the below image , baseline performance depends on the analysis and it can be human level performance, or performance of other well-established models, if your model’s training error is much higher, then this is a sign that the model has high bias (has underfit).
Hope now you have a better understanding. So probably because your batch size is shallow, check into additional features and reduce your learning rate.