Hi there,
as an example, let‘s assume you want to predict the value of the S&P 500 in ten years.
Very roughly speaking as an example: let‘s assume 7% return per year is your assumption. So a possible model could be future_p = today_p *1.07^{t}. With t = 10 years you might estimate: 1.07^{10}= ~200% meaning we might expect a doubling in value if our model would be correct as an estimate for the value in 10 years.
What would underfitting mean?
Well: if your model would not be exponential as in our case but e.g. linear, it might be too simple to learn the complex cause effects. If a network has too few parameters or is regulated too much, this can also lead to underfitting with a strong bias. This bias can represent the systematic error in your residuum.
High variance means the exact difference: your model is too complex (and maybe not regulated sufficiently) and is overfitting on noise instead of learning the actual behaviour. Often it goes hand in hand with having too little data given the feature space (I tried to explain this here more in detail: https://github.com/christiansimonis/CRISP-DM-AI-tutorial/blob/master/Classic_ML.ipynb )
An example for overfitting is using an oscillating model (e.g. a High polynomial function) for a rather linear relationship where the noise affects the model too much:
Also the Wikipedia article might be helpful:
The bias–variance dilemma or bias–variance problem is the conflict in trying to simultaneously minimize these two sources of error that prevent supervised learningalgorithms from generalizing beyond their training set:[1][2]
- The bias error is an error from erroneous assumptions in the learning algorithm. High bias can cause an algorithm to miss the relevant relations between features and target outputs (underfitting).
- The variance is an error from sensitivity to small fluctuations in the training set. High variance may result from an algorithm modeling the random noise in the training data (overfitting).
Bias–variance tradeoff - Wikipedia
Best
Christian