From 2min26s, in the following sentence, I don’t know what is meant by “sampling and re-weighting rows”:
“Careful feature selection, including sampling and reweighting rows to minimize discrimination in training data can also be helpful when training.”
Sampling of what …and how ?
Re-weighting what rows ? Rows in a tensor? Do different rows in a tensor ever have different weights ? I thought the same weights apply to all training examples … how else can it learn a function that is used to infer predictions in eval/test/serving data ?
Hello @shahin ,
Thanks for raising this question.
With sampling and re-weighting rows it is referring to sampling and re-weighting examples from a specific subgroup in the training data (the subgroup of which we want to prevent bias).
With sampling we can for instance collect more data on a particular subgroup in order to prevent bias on that subset of the data.
By re-weighting during training it is giving more or less importance to the examples of a particular subset (e.g. by attributing a higher loss to those examples during training and therefore the model will try to minimize the error for that subset). It is not referring to modifying the weights of the model only for this subset, so all examples during eval and serving will go through the same model and see the same model weights.
See the intro of the following paper for some more info on reweighting examples during training:
Hope this clarifies.