Doubt regarding skewed datasets

Professor shows 2 new metrics to measure performance in a skered dataset: Precision and Recall. We see here that the simple y=0 model has a bad recall and undefined precision, though it has 99.5% accuracy in case of actual data containing .5% cases. However, a model which has just 99% accuracy, will have better precision and recall in this case.

Would it not be better then, for example to manipulate the data so that it doesn’t reflect real world statistics, and instead our data contains more positive cases than negative, to deliberately make sure that wrong models like y=0 don’t outperform actual trained models?

If my model has good precision and recall but performs worse than y=0, how do I discern, is my main doubt.

With a skewed dataset, neither precision or recall are very good by themselves.
Combining them into the F1 score is helpful.

1 Like