Ideally, **high variance** models are complex and represent all the features of the training set very well leading to minimal error on the training set but **fail** to generalize to the **unseen** data. In contrast, **high** **bias** models represent extremely simple mappings and can generalize some features to the unseen data, but the simplicity of these models leads to **underfitting** on the training set and generates predictions with high bias when applied to data outside of the training set .But , If we took the quote of Andrew Ng’s book :

**

“if we estimate the bias (Training error) as 15% and the variance (Cross validation error ) as 15% then the classifier has high bias and high variance “

**

if we translate this quote as the **high bias** is likely the **underfitting** and the **high variance** as the **overfitting** the model would be **suffering** from **both** overfitting and underfitting it’s quite weird and tremendously confusing because as confirmed that those terminologies **can’t happen** at the same time . Here is what I found in **Stack Exchange** It said :

- if the train data
**Xi = (x1,x2)**and we**fit**the model on x1, (x1)**^2**,(x1)**^3**… The model**won’t**capture the**x2**so we will have**underfitting**.But conversely, including spurious**powers**of x1(or any other spurious predictors) means that we can**overfit**, and usually will do so, unless we**regularize**in some way .

I understood the example but it still doesn’t make any sense to me can you please give me a simple explanation for these terminologies , how should I imagine this problem and I’ll appreciate it .