Why relative frequencies are told to be a problem?

popaqy · October 13, 2022, 4:03pm

Yes, real-world datasets are not balanced. But we introduced the concept of “Log prior” which does a great job at dealing with unbalanced sets. Then why are unbalanced sets told to be a problem for Naive Bayes?

nmurugesh · October 14, 2022, 3:23am

yes, I also only remember reading assumption of positional independence of features (i.e. for example within a tweet) as the second assumption of bayes and not relative frequencies of examples/samples

Maryam · October 15, 2022, 9:35am

That’s an interesting question actually, I think you might be interested in this research An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics which compares three methods of improved classification on unbalanced data. I hope it’ll help.

Vu_Hoang_Ngo · October 17, 2022, 5:15am

Great question. I think you may find this article helpful: Naive Bayes Classifier: Pros & Cons, Applications & Types Explained | upGrad blog

Topic		Replies	Views
C1_W2, why would Naive Bayes perform poorly for the given dataset distribution NLP with Classification and Vector Spaces course-related , week-2	1	178	May 8, 2024
What if we have the same frequency score on both a positive and a negative tweet NLP with Classification and Vector Spaces week-1	1	547	December 31, 2021
How does assumption of independence among features hurt Naive Bayes? NLP with Classification and Vector Spaces week-2 , week-3	1	518	June 28, 2022
Prior Ratio advantage in unbalanced datasets NLP with Classification and Vector Spaces week-2 , week-3	1	541	January 12, 2022
Baye's rule and Naive Bayes NLP with Classification and Vector Spaces week-2 , week-3	8	581	July 11, 2023

Why relative frequencies are told to be a problem?

Related topics