Bias and variance seems counterintuitive

Seongha_Yi · June 1, 2022, 12:36am

I feel that bias and variance in deep learning clash with my statistical understanding.

Shouldn’t they get switched with each other?

This is my reasoning:

In statistical inference, a model that only explains the sample and cannot be generalized to the population has a high bias (ie. overfitted network). And a model that’s working with a small sample and imprecisely predicts the population parameters is characterized as having a high variance (ie. underfitted network or small data set).

It seems the opposite is true in deep learning

Thanks for your insight.

paulinpaloalto · June 1, 2022, 4:48am

That’s an interesting point. Sorry, but my background does not include much statistics, so the statistical way of using those terms is new to me. It sounds like this is just a difference between the statistics world and the ML/DL/AI world. I come more from a math background, so a much more specific and minor version of a terminology difference between the ML world and “math world” is that in math log means base 10 and ln means natural log. But in the ML world, the only logarithms are natural logs and log always means natural log.

alvaroramajo · June 1, 2022, 12:05pm

Hi, @Seongha_Yi!

Intuitively, in Deep Learning, we use the bias to talk about a model that is underfitting the data. You can associate this idea with a linear regression approximation that only uses the bias term (the coefficient that multiplies x^0). When we have an overfitted model, we say it has a high variance to refer to the high orders of the polynomial function that would result from a hard linear regression approximation (ax^{100}+bx^{99}+ \cdots).

Seongha_Yi · June 1, 2022, 12:39pm

Oh, that’s interesting. So the terms ‘variance’ and ‘bias’ has to do with the formula rather than the actual concepts.

Thanks for your answer

P. S.

I think I can intuitively understand why the coefficient that multiplies x0 is called bias although there’s no explanation in the course. Since the bias term is a constant that shifts the line up and down, it prevents the model from consistently under or overestimating the population parameters. In other words, the bias terms helps the model deal with bias.

Am I right?

Seongha_Yi · June 1, 2022, 12:46pm

This is the way I came to terms with the concept.

An overfitted model can accurately explain data that are similar to our sample but struggles with data that are different from the sample. In other words, the model’s accuracy varies from data set to data set. Thus, high variance.

On the other hand, an under-fitted model can accurately explain neither our sample nor any other data (i.e., it consistently underestimates or overestimates the result). So this model is biased.

Seongha_Yi · June 1, 2022, 12:49pm

Different fields using different notations for the same concepts is both fascinating and frustrating. For example, I was accustomed to vectors having an arrow over their head in physics. However, in ML, vectors are just symbols, which makes them difficult to tell apart from scalars.

anon57530071 · June 1, 2022, 1:17pm

Just a comment for a vector. This is also written in Wikipedia as well.

The ISO recommends either bold italic serif, as in \bf \textit{v} , or non-bold italic serif accented by a right arrow, as in \vec{v}.

alvaroramajo · June 1, 2022, 1:21pm

It’s not like the bias term helps the model, it’s more of an intuition of the results of a linear regression performed with just a bias term: an underfitting model that does not approximate the data well.

alvaroramajo · June 1, 2022, 1:22pm

Exactly @Seongha_Yi, you’re right.

alvaroramajo · June 1, 2022, 1:23pm

In DL, we assume that almost every variable we talk about is a matrix. Even scalars: they are 1x1 matrix.

kiavash_fathi · June 1, 2022, 1:55pm

All the explanations were great Just as a small input, maybe you can think about it this way:
If your model is not complex enough, or as you mentioned can not be generalized given its lack of complexity, it will have a high bias (as expected). However, if you have a very complex model which tries to fit the available data (with all the possible noise and outliers) then you tend to have significant changes in the output even if the changes in the input are not that significant. In other words, your model will have outputs with a high variance given the fact that it was trying to model some aspects of the data which are not domain (or data) related and are merely noise (e.g., measurement error, etc.). I suggest you read the math here and try to extend it to your question regarding ML
These terms do have something to do with the concepts you have learned before

Seongha_Yi · June 1, 2022, 4:33pm

That’s very helpful. Thanks

Seongha_Yi · June 1, 2022, 4:34pm

This really deepens my understanding of variance and bias. Glad I got the concepts correctly.

Thank you.

Seongha_Yi · June 1, 2022, 4:35pm

that clarifies things.
thanks.

Topic		Replies	Views
Overfitting and underfitting at the same time ? Bias-variance-trade-off Improving Deep Neural Networks: Hyperparameter tun	2	654	July 19, 2021
Week 1 video bias / variance Improving Deep Neural Networks: Hyperparameter tun	1	620	July 21, 2021
Terminology question. Low bias Advanced Learning Algorithms week-3	4	502	January 11, 2023
Bias vs Variance Improving Deep Neural Networks: Hyperparameter tun	1	520	December 20, 2021
What's the intuition on using "bias" as a description of a model Advanced Learning Algorithms week-3	3	483	September 3, 2022

Bias and variance seems counterintuitive

Related topics