Learning Curve-Week-3-Course2-MachineLearning

Hi I am currently in Machine Learning-Course2-Week3-Learning Curve lecture.
Here they are plotting a curve with error value in y-axis and m(train) value in x-axis. They are plotting both J(train) and J(CV) in the same plot. But the difference between J(train) and J(CV) are the data set. As per equation, both are same. Then how come for a particular data set, we will attain different J(train) and J(CV) value?


I have placed you question at your specializations place in the forum, you posted in the general section. Now to your question, Jcv and Jtrain are computed on different splits of the original dataset, they are not the same. The original dataset was split for train, validation (cv) and maybe even a test subset. So the costs will not be the same.

Thanks for your quick response.
I too saying the same. They are different value based on different training data. But I’m that lecture they plotted a single graph with same training data, but different j values. If the training data is same, we will attain same values of J(train) and J(CV). I am attaching the screenshot for your reference.

This graph merely tells you how both costs change with change on number of examples used, or training set size. It doesnt tell you that the data used are the same or not, this is just indicative to tell you that the more examples you have the better the performance and learning of your model.

1 Like

Hey @arjunHack,
Just to add to @gent.spah explanation, the concept can be easily understood with an example. Consider that you start with a dataset of 2000 samples. Now, you split the dataset into training and cross-validation, let’s say, 1600 examples in the training set, and 400 examples in the cross-validation set.

So, the initial value on the X-axis is 1600 (not 2000 or 400). And now, you simply keep on increasing the training set size, say 1600 β†’ 1800 β†’ 2000 β†’ 2200, and so on. Note that the cross-validation set still has only 400 examples, i.e., you only perform the split once (for the original dataset). In other words, J_{cv} is always calculated on the same set for this plot, while J_{train} is calculated on larger and larger sets, and the model keeps on getting improved as J_{train} becomes larger and larger.

You will find that as the model trains on larger and larger training sets, it will generalize better on unseen examples, i.e., J_{cv} will decrease, and the extent of over-fitting on the training set will also decrease, i.e., J_{train} will increase. I hope this helps.



oh fine, thanks. I understood

fantastic explanation. Thanks a lot

Great explanation. Thanks a lot