In the practice-lab-advice-for-applying-machine-learning 3.4 Getting more data: Increasing Training Set Size (m), there will be a few points where error in training data set is larger than error in cross validation data set. What does it mean?
[* Link to the classroom item you are referring to:](it says I can’t include link? sorry, it is my 1st post, getting a bit confused.)
Firstly, both training and cv data were randomly generated. Depending on what data points were actually used, the calculated error values shall be different. In other words, we should expect a certain fluctuation behind each calculated values. Consequently, for m=150, in your plot, it happened that training error was larger.
Below is the result with different randomly generated data and m=150 no longer sees larger training error:
Therefore, the larger training error observed there was a matter of randomness and thus might be viewed as not significant.
Secondly, the key message of the plot was that, as m increased, the cv error first dropped rapidly and then tended to plateau around a certain level. The drop could be explained by lowered variance while the plateau might be viewed as convergence (or limitation that more m wouldn’t improve). The training error, however, stayed at lower level all the time because the model was trained to minimize training error.