Skewed and imbalanced datasets

Greetings! Would you kindly elucidate the distinction between imbalanced data and skewed data within the realm of machine learning?
Thank you.

Hello @raid_athmane_BENLALA

A classification dataset with skewed class proportions is called imbalanced dataset.

Classes that make up a large proportion of the data set are called majority classes.

Those that make up a smaller proportion are minority classes.

if we have a binary classification problem with 1000 instances, and only 100 of them are of the positive class (and the remaining 900 are of the negative class), then the dataset is imbalanced, and the represent of such classification proportion where negative class being more than positive gives a skewed data presentation.

Regards
DP

1 Like

Hello @Deepti_Prasad ,

Thank you very much for your clear and concise explanation. Your example with class proportions in a classification dataset really helped clarify the concepts for me.

I now understand that in an imbalanced dataset, the class proportions are highly unequal, with one being the majority class and the other being the minority class. This can pose challenges when training a model, as it may tend to favor the majority class, leading to bias in predictions.

Once again, thank you for your invaluable assistance!

Best regards,
Raid

1 Like