I really want to see what other people think about this.
Let’s say mortgage lender asks for a model to predict people’s ability to pay off credit. They provide their dataset and we get to work. I’m 99% certain that this dataset will include age, gender, marital status etc.
Now, let’s say we declared age and/or gender protected attributes and want to change model outputs to allow for equal opportunity between men and women, 20 year old and 60 year old people to get a mortgage. Yes, we do have antidiscrimination laws in place. But age, gender etc. are good composite/aggregation attributes. They succinctly sum up (to a degree of course) our socioeconomic roles and ability to generate income. On one hand, the model could have just collapsed all these additional attributes (if it had them) and started “listening” to these two. On the other hand, removing these attributes may render precision of the model insufficient for our client’s purposes and they will be losing money. This will not sit well with them.
Adding extra attributes into the dataset is not possible since it has been collected over the course of the last 15 years.
So what’s the course of action in this case? It’s not about how to do it, it’s about what to do to make your client happy and comply with all the laws.