C2W3 What does it mean for error in training data set larger than error in cross validation data set?

Shing314 · January 13, 2025, 12:11pm

In the practice-lab-advice-for-applying-machine-learning 3.4 Getting more data: Increasing Training Set Size (m), there will be a few points where error in training data set is larger than error in cross validation data set. What does it mean?

[* Link to the classroom item you are referring to:](it says I can’t include link? sorry, it is my 1st post, getting a bit confused.)

rmwkwok · January 13, 2025, 12:51pm

Hello, @Shing314,

Welcome to our community!

I think you were asking specifically the point at m=150 because it’s the one where training error was clearly larger than CV error.

Firstly, both training and cv data were randomly generated. Depending on what data points were actually used, the calculated error values shall be different. In other words, we should expect a certain fluctuation behind each calculated values. Consequently, for m=150, in your plot, it happened that training error was larger.

Below is the result with different randomly generated data and m=150 no longer sees larger training error:

Therefore, the larger training error observed there was a matter of randomness and thus might be viewed as not significant.

Secondly, the key message of the plot was that, as m increased, the cv error first dropped rapidly and then tended to plateau around a certain level. The drop could be explained by lowered variance while the plateau might be viewed as convergence (or limitation that more m wouldn’t improve). The training error, however, stayed at lower level all the time because the model was trained to minimize training error.

Cheers,
Raymond

Topic		Replies	Views
Training set vs cv set Advanced Learning Algorithms week-module-3	3	77	June 19, 2024
Test set generalisation error larger than CV error? Advanced Learning Algorithms week-module-3	1	342	September 1, 2023
#C2W3 What if dev set error lower then train set error using Regularization Advanced Learning Algorithms week-module-3	2	67	June 18, 2024
Train error vs validation error Supervised ML: Regression and Classification week-module-3	14	352	November 9, 2023
Ouverffiting and underfitting Advanced Learning Algorithms week-module-1	2	497	November 5, 2022

C2W3 What does it mean for error in training data set larger than error in cross validation data set?

Related topics