Confused in different machine learning terms

Here are the terms I am scared of when found someone talking about. Please help me overcome the fear

  • bias (comes when underfitting)
  • variance (comes when overfitting)
  • bias-variance tradeoff (comes when best fit)
  • precision and recall (when checking accuracy)

Is the bias related to the b term of the linear equation y = \vec W \vec X + \textbf{b}

2 Likes

The bias is not related to the “b” term in the linear equation y = WX+b

The bias term refers to a type of error in the predictions of a model. This error happens when the model produces some outcomes more than others. An example of bias has been seen in models used in Human Resources for personnel selection. It has been found that many models usually prefer to select males over females. This is a bias in the model.

There are several technics to reduce or eliminate bias, like these 2:

  1. Data pre-processing: we have to make sure that the dataset used to train our model represents in a balanced way all the possible classes.
  2. Regularization
3 Likes

Hi @tbhaxor,

I’m going to try my best explain these with a very basic example/explaination.

Consider a dataset of 10 images, 5 cat and 5 non cat (in real life the dataset should be more than 10 and should contain both, cat images and non cat images).

I train the model on those 10 images, let’s say, 5 times. 5 iterations are not enough, so what might happen is that when I give it a real life example to predict on an image of a dog, the model could predict it as a cat based on the assumption that animal in the images (cat) had two ears, and since the dog image also has two ears so it is a cat. That’s underfitting.

Now consider, I take this 10 images and train the model for 1000 iterations. So what will happen is that when I give an actual image of a cat to predict, it will not consider it a cat because for the model cats look like only what they look like in those 5 images. That’s overfitting.

The bias-variance tradeoff then is that you have to find a middle ground where there’s just “enough” fitting that it makes sound predictions, even on the images it has never seen before.

5 Likes

Now lets talk about precision and recall.

But first lets talk about why you need those.

(Below explanation is independent of the example used above)

Let’s say, I have 10 images, 6 are cat and 4 are non cat. My model, out of the 4 non cat images, predict two of them as cats and 2 of them as non cats.

And out of the 6 cat images, it predicts 5 as cat and 1 as non cat.

So in total, it predicted 7 images as correct (5 cat and 2 non cat). 7/10 gives us a 70% accuracy level for these 10 images, which one might think is not bad. Based just on the accuracy metric, our model is doing things 70% correct.

But if we look closely, we realise, it is not doing well on the prediction of the non cat images. Out of the 4, it only predicted 2 as non cats, so that’s 50% accuracy on the non cat images (and 5/6, 83.3% on the cat images). You would not want to deploy such a model which is performing poorly on non cat images.

That’s where the metric of precision and real comes in, and that’s why you need those to check for the in-depth performance of your model.

This is a good read on precision and recall.

Hope all of this helps.

Cheers,
Mubsi

4 Likes

What I found that machine learning is easy but, english is difficult :laughing:

So it happens in two cases (either or both can be met)

  • less data, model can not foresee the best data points therefore fails to draw a line for the test or production data. It is intuition based “i can get max acc with this model so I assume this is the best decision boundary
  • the function is not good for the problem, for example like Andrew sir used intiialially used the linear function on the sigmoid data, though it can work the best for these inputs, but will fail when other data poitns are added on the graph.

Please correct me if I am wrong, this thread have a lot of discussions because of these 4 terms I always get demotivated and drop machine learning courses. I hope it not be the case here :smile:

Is it true? Because I searched about it on the machine learning mastery and they said something different https://machinelearningmastery.com/gentle-introduction-to-the-bias-variance-trade-off-in-machine-learning/

1 Like

When we achieve it is called the generalized model

Also one question, we discuss these errors while evaluations of model on test data (except under-fitting)

Here is my intuition for the overfitting.

When the model assume that only this data will be there and fits a strict line on all the point regardless of how far (variance) they are from the actual line, it is called overfit. I should learn it as much as I can, at point which we humans call as “cramming”. So a new data point comes and it fail to classify or estimate the output for it. Because the model learning is fitting (drawing a line) on each and every data point no matter how much they vary it is also called high variance.

It is like if you are said to study from a book and do anything you want, so as student you will map question number with the answers and presume that exactly same questions will come in the exam tomorrow.

Well, let me change my word ‘error’ for ‘phenomenon’. An undesirable one.

HOWEVER: In the very same article you are pointing me to, the author also calls Bias an Error. Check out the title “Overview of Bias and Variance”.

I think that Jason Brownlee develops a very good explanation of these terms, in this article you mention. I’ll add: “If Jason B. says it, I believe it” :slight_smile:

I’d like to change this to say, “less data, the model cannot foresee all the possible real world data points therefore fails to draw a good partition line

Again, a reminder, how I explained things above is a very basic example, of course, there’s a lot more than that. But with that explanation, I was trying to help you understand the concepts of underfitting and overfitting.

That is correct.

We shall try our best so that this doesn’t happen again.

1 Like

That makes sense. Oh one more think we used to say that “this data has bias” or “there is a bias in the feature” mostly we get this in the data collected from surveys.

Also Jason B. and Juan has added “Error (I call it noise)” word with bias. If the Bias is so bad and we prevent it from the adding in the machine learning. Why model depends on the Bias to converge? Bias (acc to Jason B.) is the assumptions made for model to learn easier. Like lets assume this will be the case and try to fit on it.

Here not talking about about b I get that, if the weight always have 1 coeff, then it called bias or intercept (specifically in linear regression). I am talking about bias as belief or feeling (whatever).

That is yet a different meaning of the word “bias”: that is closer to the classic definition of that word that you will find if you look it up in the dictionary. An example would be if you are trying to build a recommender system to recommend movies that someone might like and you only collect data from people under 30 years old. Then the system will do a bad job of making recommendations for people over the age of 60, so that dataset is biased towards the preferences of younger people. You have to make sure that the data you use to train your system represents the full range of the things you need the system to predict.

Then in math they use “bias” to mean the b term in a linear expression like:

y = mx + b

And then in ML, there is yet a different definition which is what Mubsi and others have been describing on this thread in the “bias/variance” tradeoff discussion.

3 Likes

We calculate accuracy for all the labels together, precision and recall i calculated on favourable and non favourable labels. For the multiclass lets say we have cat, dog mouse we will calculate it for cat/non-cat, dog/non-dog and mouse/not mouse.

I think the definition of Juan_Olano is also correct, but the concept is multifaceted. In the equation y = mx + b, the bias term (b) represents the y-intercept, or the point at which the line crosses the y-axis. Similarly, in a machine learning model, bias can refer to an error that is introduced when the model is trained with a certain set of assumptions or parameters.

When a machine learning model is said to be biased because it only selects males, it means that it has been trained on data that only includes males, and as a result, it may be less accurate or less fair when making predictions about females. This is an example of selection bias, where the sample used to train the model is not representative of the population it is intended to make predictions about.

In this case, the bias term in the equation y = mx + b, would be the corresponding to the point where the line crosses the y-axis when x is only taking the values of males. If the sample used to train the model only includes males, the line that represents the model will only be defined by the points corresponding to males, meaning that the y-intercept is going to be different if the sample used would be inclusive of both males and females.

The intercept in a linear equation, represented by the bias term (b) in the equation y = mx + b, can be referred to as a bias because it represents a fixed value that is added to the equation regardless of the input variable x.
In the context of machine learning, bias can refer to an error that is introduced when a model is trained with a certain set of assumptions or parameters. A biased model is one that is more likely to make certain types of errors, such as consistently over- or under-estimating certain values.
In this sense, the intercept term, also known as bias term, can introduce bias in a model because it can cause the model to be shifted in certain direction, independent of the input variable and it can cause the model to make errors in predictions. This is why the term bias is often used to refer to the intercept term in a linear model.
It’s also important to note that, the bias term can also imply a notion of unfairness, since it can cause the model to be more accurate or precise for certain groups of the data, but less accurate or precise for others.