Isn't Feature Skew one form of Distribution Skew

dncc · July 7, 2021, 10:53am

In the first video on data validation, Distribution Skew is defined as a significant divergence of training and serving data sets that can be manifested as:

dataset shift (P_train(x, y) != P_serve(x,y))
covariate shift (P_train(x) != P_serve(x)) and
concept shift (P_train(y|x) != P_serve(y|x))

When features between training and serving are different due to different data transformation that is applied during training and serving isn’t this a valid example of a covariate shift since P_train(x) and P_serve(x) will be different, therefore it is one variant of the Distribution Skew (this is related to one question in the quiz, I’m trying not to reveal too much) ?

Thanks

luigisaetta · July 8, 2021, 10:47am

Hi @dncc

you say: ‘due to different data transformation that is applied during training and serving…’

but normally you must apply the same transformation to training and serving, unless the two datasets have different characteristics. I could think that you want to apply different transformations for example because you want to correct a data skew… but in my honest opinion the question is not correctly set up.
Could you make an example of why you want to have different transformations in train and serving?

chsafouane · July 10, 2021, 12:45pm

I don’t quite get dataset shift. Looking at the formula, one can have a dataset shift as a consequence of either a covariate shift or a concept shift (mathematically, one may have the same joint distribution in any of these two cases but that wouldn’t really happen in real life).

The problem I have is that I fail to give a proper definition for dataset shift (as a standalone type of shift) without defining it as a consequence of either covariate or concept shift

Topic		Replies	Views
Drift and skew difference Machine Learning Data Lifecycle in Production week-1	4	151	August 2, 2024
Is covariate shift the same as data drift? Machine Learning Data Lifecycle in Production	5	657	March 14, 2023
What is feature skew Machine Learning Data Lifecycle in Production week-1	5	78	August 1, 2024
Is there any other kind of shift / drift belong to prior probability shift but not concept shift? Machine Learning Modeling Pipelines in Production	5	483	September 28, 2023
What is the correct name for data and concept drift? Machine Learning Data Lifecycle in Production	6	600	November 30, 2022

Isn't Feature Skew one form of Distribution Skew

Related topics