Bias and variance seems counterintuitive

I feel that bias and variance in deep learning clash with my statistical understanding.

Shouldn’t they get switched with each other?

This is my reasoning:

In statistical inference, a model that only explains the sample and cannot be generalized to the population has a high bias (ie. overfitted network). And a model that’s working with a small sample and imprecisely predicts the population parameters is characterized as having a high variance (ie. underfitted network or small data set).

It seems the opposite is true in deep learning

Thanks for your insight.

That’s an interesting point. Sorry, but my background does not include much statistics, so the statistical way of using those terms is new to me. It sounds like this is just a difference between the statistics world and the ML/DL/AI world. I come more from a math background, so a much more specific and minor version of a terminology difference between the ML world and “math world” is that in math log means base 10 and ln means natural log. But in the ML world, the only logarithms are natural logs and log always means natural log.

Hi, @Seongha_Yi!

Intuitively, in Deep Learning, we use the bias to talk about a model that is underfitting the data. You can associate this idea with a linear regression approximation that only uses the bias term (the coefficient that multiplies x^0). When we have an overfitted model, we say it has a high variance to refer to the high orders of the polynomial function that would result from a hard linear regression approximation (ax^{100}+bx^{99}+ \cdots).

Oh, that’s interesting. So the terms ‘variance’ and ‘bias’ has to do with the formula rather than the actual concepts.

Thanks for your answer

P. S.

I think I can intuitively understand why the coefficient that multiplies x0 is called bias although there’s no explanation in the course. Since the bias term is a constant that shifts the line up and down, it prevents the model from consistently under or overestimating the population parameters. In other words, the bias terms helps the model deal with bias.

Am I right?

This is the way I came to terms with the concept.

An overfitted model can accurately explain data that are similar to our sample but struggles with data that are different from the sample. In other words, the model’s accuracy varies from data set to data set. Thus, high variance.

On the other hand, an under-fitted model can accurately explain neither our sample nor any other data (i.e., it consistently underestimates or overestimates the result). So this model is biased.

Different fields using different notations for the same concepts is both fascinating and frustrating. For example, I was accustomed to vectors having an arrow over their head in physics. However, in ML, vectors are just symbols, which makes them difficult to tell apart from scalars.

Just a comment for a vector. This is also written in Wikipedia as well.

The ISO recommends either bold italic serif, as in \bf \textit{v} , or non-bold italic serif accented by a right arrow, as in \vec{v}.

It’s not like the bias term helps the model, it’s more of an intuition of the results of a linear regression performed with just a bias term: an underfitting model that does not approximate the data well.

Exactly @Seongha_Yi, you’re right.

In DL, we assume that almost every variable we talk about is a matrix. Even scalars: they are 1x1 matrix.

1 Like

All the explanations were great :+1: Just as a small input, maybe you can think about it this way:
If your model is not complex enough, or as you mentioned can not be generalized given its lack of complexity, it will have a high bias (as expected). However, if you have a very complex model which tries to fit the available data (with all the possible noise and outliers) then you tend to have significant changes in the output even if the changes in the input are not that significant. In other words, your model will have outputs with a high variance given the fact that it was trying to model some aspects of the data which are not domain (or data) related and are merely noise (e.g., measurement error, etc.). I suggest you read the math here and try to extend it to your question regarding ML :slight_smile:
These terms do have something to do with the concepts you have learned before :wink:

That’s very helpful. Thanks

This really deepens my understanding of variance and bias. Glad I got the concepts correctly.

Thank you.

that clarifies things.