I feel that bias and variance in deep learning clash with my statistical understanding.
Shouldn’t they get switched with each other?
This is my reasoning:
In statistical inference, a model that only explains the sample and cannot be generalized to the population has a high bias (ie. overfitted network). And a model that’s working with a small sample and imprecisely predicts the population parameters is characterized as having a high variance (ie. underfitted network or small data set).
It seems the opposite is true in deep learning
Thanks for your insight.
That’s an interesting point. Sorry, but my background does not include much statistics, so the statistical way of using those terms is new to me. It sounds like this is just a difference between the statistics world and the ML/DL/AI world. I come more from a math background, so a much more specific and minor version of a terminology difference between the ML world and “math world” is that in math log means base 10 and ln means natural log. But in the ML world, the only logarithms are natural logs and log always means natural log.
Hi, @Seongha_Yi!
Intuitively, in Deep Learning, we use the bias to talk about a model that is underfitting the data. You can associate this idea with a linear regression approximation that only uses the bias term (the coefficient that multiplies x^0). When we have an overfitted model, we say it has a high variance to refer to the high orders of the polynomial function that would result from a hard linear regression approximation (ax^{100}+bx^{99}+ \cdots).
Oh, that’s interesting. So the terms ‘variance’ and ‘bias’ has to do with the formula rather than the actual concepts.
Thanks for your answer
P. S.
I think I can intuitively understand why the coefficient that multiplies x0 is called bias although there’s no explanation in the course. Since the bias term is a constant that shifts the line up and down, it prevents the model from consistently under or overestimating the population parameters. In other words, the bias terms helps the model deal with bias.
Am I right?
This is the way I came to terms with the concept.
An overfitted model can accurately explain data that are similar to our sample but struggles with data that are different from the sample. In other words, the model’s accuracy varies from data set to data set. Thus, high variance.
On the other hand, an under-fitted model can accurately explain neither our sample nor any other data (i.e., it consistently underestimates or overestimates the result). So this model is biased.
Different fields using different notations for the same concepts is both fascinating and frustrating. For example, I was accustomed to vectors having an arrow over their head in physics. However, in ML, vectors are just symbols, which makes them difficult to tell apart from scalars.
Just a comment for a vector. This is also written in Wikipedia as well.
The ISO recommends either bold italic serif, as in \bf \textit{v} , or non-bold italic serif accented by a right arrow, as in \vec{v}.
It’s not like the bias term helps the model, it’s more of an intuition of the results of a linear regression performed with just a bias term: an underfitting model that does not approximate the data well.
Exactly @Seongha_Yi, you’re right.
In DL, we assume that almost every variable we talk about is a matrix. Even scalars: they are 1x1 matrix.
1 Like
That’s very helpful. Thanks
This really deepens my understanding of variance and bias. Glad I got the concepts correctly.
Thank you.
that clarifies things.
thanks.