In the first video on data validation, Distribution Skew is defined as a significant divergence of training and serving data sets that can be manifested as:

dataset shift (P_train(x, y) != P_serve(x,y))

covariate shift (P_train(x) != P_serve(x)) and

concept shift (P_train(y|x) != P_serve(y|x))

When features between training and serving are different due to different data transformation that is applied during training and serving isn’t this a valid example of a covariate shift since P_train(x) and P_serve(x) will be different, therefore it is one variant of the Distribution Skew (this is related to one question in the quiz, I’m trying not to reveal too much) ?

you say: ‘due to different data transformation that is applied during training and serving…’

but normally you must apply the same transformation to training and serving, unless the two datasets have different characteristics. I could think that you want to apply different transformations for example because you want to correct a data skew… but in my honest opinion the question is not correctly set up.
Could you make an example of why you want to have different transformations in train and serving?

I don’t quite get dataset shift. Looking at the formula, one can have a dataset shift as a consequence of either a covariate shift or a concept shift (mathematically, one may have the same joint distribution in any of these two cases but that wouldn’t really happen in real life).

The problem I have is that I fail to give a proper definition for dataset shift (as a standalone type of shift) without defining it as a consequence of either covariate or concept shift