Adding more data after error analysis

Seungjun_Lee · September 22, 2022, 2:08pm

How do we know that adding more data for a certain category that poorly performed in error analysis will increase model’s accuracy?

Is that we don’t need to check whether model is high bias or high variance?

rmwkwok · September 22, 2022, 2:49pm

Hello Seungjun @Seungjun_Lee,

We definitely want to check if the model has high bias or/and high variance, that’s why when we say a model is poor on certain category, we can additionally talk about whether it is poor on both the training & cv set (high bias), or it is poorer on the cv set than it is on the training set (high variance).

When it has a high bias, chance is that adding more data for a certain category can help that category but not the overall performance, because your neural network may not have enough freedom (or number of neurons) to express all crucial features for distinguishing sample of one category from samples of another.

When it has a high variance, adding more data can help your neural network be less sensitive to the noise which is not a common and not an useful feature among your samples.

Therefore, adding data for a certain category won’t be harmful to that category, but in order to maximize the benefit brought to us by those extra samples, we need to know whether our model is underfitting because in that case we would want to have a bigger neural network to accomodate all the useful features.

Lastly, doing the high bias and high variance check is one thing, but examining the data in that poorly performed category is another thing. The better we know about the difference between our training sample and the real world sample, the more likely we are able to introduce actually useful samples to the training set. For example, if it is a image recognition model and it poorly performs on cat images, and in our analysis we find that our training samples never show the side view of a cat, but the real world samples do, then we know we need more side views of cat.

Cheers,
Raymond

Topic		Replies	Views
Question about lecture - Error analysis Advanced Learning Algorithms week-3	8	320	January 13, 2024
Week1 Quiz Problem Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	544	May 25, 2022
Error Analysis, Adding Data Query Advanced Learning Algorithms week-3	5	504	February 12, 2023
How to deal with high bias and high variance? Advanced Learning Algorithms week-3	1	634	July 19, 2022
Quiz-Practical aspects of Deep Learning Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	593	August 25, 2022

Adding more data after error analysis

Related topics