Ideally, high variance models are complex and represent all the features of the training set very well leading to minimal error on the training set but fail to generalize to the unseen data. In contrast, high bias models represent extremely simple mappings and can generalize some features to the unseen data, but the simplicity of these models leads to underfitting on the training set and generates predictions with high bias when applied to data outside of the training set .But , If we took the quote of Andrew Ng’s book :
**
“if we estimate the bias (Training error) as 15% and the variance (Cross validation error ) as 15% then the classifier has high bias and high variance “
**
if we translate this quote as the high bias is likely the underfitting and the high variance as the overfitting the model would be suffering from both overfitting and underfitting it’s quite weird and tremendously confusing because as confirmed that those terminologies can’t happen at the same time . Here is what I found in Stack Exchange It said :
- if the train data Xi = (x1,x2) and we fit the model on x1, (x1) ^2 ,(x1) ^3 … The model won’t capture the x2 so we will have underfitting .But conversely, including spurious powers of x1(or any other spurious predictors) means that we can overfit , and usually will do so, unless we regularize in some way .
I understood the example but it still doesn’t make any sense to me can you please give me a simple explanation for these terminologies , how should I imagine this problem and I’ll appreciate it .