In machine learning problems where we have lots of data (structured dataset) as prof Ng said, algorithms can surpass human level relatively easily, so How can we set Bayes error or a proxy for Bayes error in such case, to be able to identify Avoidable Bias or Variance?
When human-level performance is surpassed, you can approximate Bayes error using some strategies. For example, if the data set contains noisy labels or inherent ambiguity (e.g., images with poor lighting, sensor errors, etc.), this introduces a form of irreducible error. Looking at data noise closely can give you an indication of the minimum possible error rate.
Additional strategies you can employ involve leveraging expert consensus or benchmark results. After estimating the Bayesian error, you can utilize it to pinpoint avoidable bias and variance. This can be achieved by comparing your model’s training and validation errors to the theoretical minimum.
@MustafaaShebl @nadtriana Nayid, please correct me if I am remembering wrong here, but I feel like Prof. Ng’s point was the opposite-- That beating human level performance is not easy (plus, then, how on Earth would we know how to evaluate that ?)-- Thus the whole reason for considering Bayes error.
I could be recalling incorrectly.
You’re right! I appreciate your clarification. Prof. Andrew Ng emphasizes that it is generally quite difficult to beat human-level performance, especially in many real-world tasks such as image recognition, speech recognition, and medical diagnosis. His point is that human-level performance is generally a good proxy for Bayesian error for most tasks because humans are highly capable. Surpassing human-level performance is challenging and only happens in special cases (such as structured data). If you do surpass human performance, you would need to find other ways (such as expert consensus) to estimate Bayesian error. Thanks again for pointing this out!
Thanks for the reply, I have a small question.
Can experts give a good estimate of Bayes error when dealing with, for example,databases of like millions of records about a certain problem, Is it something related to statistics and plotting this data to give a human performance? as I find dealing with huge data is such a challenge for humans, thats why Prof. Andrew said it is relatively easy for computers to surpass human level performance.
Great question! Experts alone cannot estimate Bayes error directly in large datasets, especially when machines outperform humans. Instead, we rely on statistical techniques and high-performing models to approximate Bayes error. In a large structured dataset, if multiple state-of-the-art algorithms produce similar error rates, that error rate can be used as a practical estimate for the Bayes error. This is a key reason why machine learning is successful in fields such as large-scale image recognition, financial data analysis, and other structured problems where humans have difficulty processing the full complexity of the data.
I really like @nadtriana’s response, as I tell others, when I first learned this, on a huge dataset, without techniques or methods, it is like learning the ‘blind can see’.
That said, we can still be blind can also obfuscate all the variables.
Or I know he is smarter and better equipped than I am, but elsewhere this was a point I parlayed today.
I would actually love a serious challenge to that one-- What is it I have in gravity, wrong ?
Adding to what @nadtriana stated, bayes theorem of error apply statistical probabilities to given number of events, in machine learning this is usually used to determine likelihood of the outcome, see the image below
Point to most importantly to remember is this is done when an analysis or model has surpassed human level performance, so the error probability is much lower.
So bayes allows us to calculate the probability of a hypothesis (P(A|B) based on its prior probability, P(A) , the probability of the observed data, P(B), and the probability of observing data given the hypothesis, P(B|A)
@nadtriana @Deepti_Prasad Thanks so much for your concern.
I know that we can take state-of-the-art algorithm as our bayes error but I think it is the same worry where we can’t determine what are these algorithms optimal error.
I understood from your clarification that we use statistical probabilities to determine Bayes error from the dataset but I couldn’t relate the equation @Deepti_Prasad satated to how we estimate Bayes error, if you may give more explanation I would be more than thankful.
Happy to explain!
Bayes error doesn’t directly determines bias or variance, but is usually used when the variance or bais related to dataset is not so minimal but not too high and yet we get 99% model prediction.
So at that time Bayes error analysis helps us compare our data analysis hypothesis based on what result we got than what the true value really exist.
Suppose the Bayes Error is non-zero, then the two classes/events/hypothesis have some overlaps, and even the best model will make some wrong predictions by assumption. (This statement even prof.Ng mentions in the course videos but I am not sure which particular video)
There are many possible reasons for a dataset to have a non-zero Bayes Error. For example:
Poor data quality: Some images in a computer vision dataset are very blurry.
Mislabelled data: The labelling process is inconsistent.
incorrect division of dataset/less data
So imagine we are getting a 99%prediction for model we created based on A feature but there is either high variance or bias between dataset.
See the below image
So now the Bayes error would basically check if the dataset has a B feature which also gives similar predictions with the presence or absence of A feature. if the result error result is non-zero.
The reason for such error analysis is usually because sometimes after a model is created and then trained shows great results but when tested with similar dataset, model performance is poor or not as the training results.
at this time we can be sure that the first model which showed 99%prediction is overfitting or doesn’t have enough data to give reduced variance or bias with the result 99%model prediction
Regards
DP
Thanks for such clarification @Deepti_Prasad, but as far as I understood we still need a reference, if I’m working on a problem like spam/non-spam emails for example we have lots of data this data may be insufficient to get 100% model prediction (as in many cases), but when I initialize the model and observe train error and dev error to identify bias and variance to tune the model I still can’t imagine what should I compare my train error with to get the avoidable bias and choose how to tune my model.
I know that in case of a common problem I can use state-of-the-art model prediction as my reference, and I didn’t get how the Bayes Theorem could help us determine the Bayes error or a proxy for the Bayes error?
Thanks again for your time.
I am really glad you mentioned spam and not spam analysis, so now I understand that you have some background understand about data analysis.
So what if based on a feature selection by advertisement are suppose to be recalled as spam and features of my youtube channel promotion need to be tagged as not spam, and the model ends up creating a model with 99% prediction of classification, but then when you used the same model you were getting my youtube promotion as not spam, that would mean the dataset really didn’t do a great job at detecting user feature variability??
So at this time, it Bayes error compares the your observed data, with the actual (my) data, then first it assumes either my model data which predicted 99% is correct or the result on your data where the prediction was much lower than my prediction model, gives a non-zero bayes error value, indicated the dataset on both ends need to be looked upon for normalisation or optimization or more data, or check for better ratio between true labels(my data features) and pred labels(your data features)
Regards
DP
Is it like I’m chasing zero bayes error and try to look upon the datasets to reach that, as in the example illustrated the model did well in your case (actual, trained) but did poorlyin my case (predicted, validation), if we used 99% as prediction on actual data then we can say that the model is overfitting if it does poorly on the predicted data, this case is when human doesn’t do good on such task, like spam/not spam so we can’t use human level performance, so we wish a zero Bayes error .
Am I getting it right, or did I lose the path completly ?
have you heard about Reinforcement learning with human feedback for llm.
it actually does help, that’s when data annotators comes in screen who are like specialised data review who have knowledge about the data being used to create a model.
like medical clinical reviewer or clinical specialist are kind of data annotators for health care data science project who understand thoroughly the medicine, clinical as well as trial part of a medicine before it goes in production or is available to common people.
Here they go through numerous data find if there are any discrepancies in diagnosis, symptoms, signs, radiographs or images related to a given diagnosis r disease, there by reducing the bias as well as variance for that particular project.
So is it the duty of the application specialists/experts to give the Bayes error, we can say that there are tools (like statistics and probabilities) can give a reasonable value for Bayes error.
I think my problem is the imagination of how this can be done, but I guess it is a career path not an easy thing to get, like you can spend so much time to enhance your estimate of the Bayes error thus enhancing the model.
the more you start practicing programming in data science, you will understand.
even there are specialisation and exercises which you will comes in your learning journey here which will make your imagination catchup with what prof.Ng must have mentioned in early course of DLS.
So I would just end on a good note and say,
Keep Learning!!!
Regards
DP
I would like to thank you very much for your effor and concern, really appreciated.