Week 2- Balancing the dataset

I did not understand the logic used for balancing the dataset. We just grouped the data. It reduces the number of rows but how does that balances the data?

Methods i have used in the past are oversampling/undersampling although that was just for target class imbalance i believe

1 Like

Hi @anurag.mitra54 ,
You’re right. To balance a dataset you can either use undersampling or oversampling techniques. Undersampling techniques remove examples from the training dataset that belong to the majority class in order to better balance the class distribution.

That is what the pandas groupby function does here. It finds the class with the smallest number of instances, then drops instances from the other classes, thereby making it balanced. You may find this stackoverflow post helpful.

Note: For more visibility, please post your question in the specific course channel. You can see PDS C1 and PDS C2 here https://community.deeplearning.ai/c/pds/36
Please choose the appropriate channel and post your question and mentors can help you more.

1 Like