Doubt regarding skewed datasets

Adeesh · June 4, 2023, 2:12am

Professor shows 2 new metrics to measure performance in a skered dataset: Precision and Recall. We see here that the simple y=0 model has a bad recall and undefined precision, though it has 99.5% accuracy in case of actual data containing .5% cases. However, a model which has just 99% accuracy, will have better precision and recall in this case.

Would it not be better then, for example to manipulate the data so that it doesn’t reflect real world statistics, and instead our data contains more positive cases than negative, to deliberately make sure that wrong models like y=0 don’t outperform actual trained models?

If my model has good precision and recall but performs worse than y=0, how do I discern, is my main doubt.

TMosh · June 5, 2023, 5:43pm

With a skewed dataset, neither precision or recall are very good by themselves.
Combining them into the F1 score is helpful.

Topic		Replies	Views
C1W2 Metrics for skewed datasets: Precision or Recall Introduction to Machine Learning in Production	1	554	April 24, 2022
#Week3 - Skewed datasets - prevision/recall metrics Advanced Learning Algorithms week-3	2	247	February 20, 2024
C5W3 Low precision, F1, and recall of the model Sequence Models	3	512	May 20, 2023
Steps after finding the F1 score is bad for skewed data Introduction to Machine Learning in Production	2	539	January 5, 2023
Precision / Recall - Error metrics for skewed data Advanced Learning Algorithms week-3	5	502	August 11, 2022

Doubt regarding skewed datasets

Related topics