General enquiry on how to improve data bias

Dear mentors, in course 1 wk2 the data bias of the women clothes reviews have been analyzed. I am wondering after data bias is detected, what features/functions offered by AWS SageMaker can be used to improve the data bias, e.g., make the dataset more balanced? Looking forward to your reply! thx!

Hi @thicc_fart,

data bias is an active area of research, the field that works on this topics is fairness/explainability. Amazon sagemaker uses clarify for working on data bias, you might want to have a look at it. For tabular data with imbalance, we generally use SMOTE, I got better results with class_weights parameter present in sklearn ml models. Today, I came across this news, I don’t know whether clarify will be able to resolve this issue. fairness is an active area of research and it requires our attention. If you are interested in learning how to tackle, you can start from here

Best Regards,
A. Sriharsha