Is covariate shift another name for data drift?
Dataset shift and covariate shift are different.
Dataset shift means that the joint distribution of targets and the input features are different across training and serving environments.
Covariate shift happens when the conditional distribution of y with respect to x is the same across both training and serving environments, but the marginal distributions of x is different (i.e. distributions of inputs to the model are different across training and serving environments).
Actually I also have the same question, and I believe the reply from @balaji.ambresh does not answer what was asked.
Could we also conclude that Concept shift is another name for Concept drift?
In this blog post from Matthew Stewart PhD, Postdoc in ML at Harvard, Concept shift and Concept drift are used interchangeably. Is it a mistake?
He also mentions that Covariate shift and Concept shift are types of Dataset shift (I am not sure if Robert Crowe is saying the same thing)
There are multiple manifestations of dataset shift that we will examine:
- Covariate shift
- Prior probability shift
- Concept shift [then referred to as Concept drift]
- Internal covariate shift (an important subtype of covariate shift)
Assuming that this is correct, Robert Crowe also says in the lesson “Detecting data issue” that Dataset shift happens “when the data has shifted over time”, but this is also the definition of drift (from the same lesson) and then I honestly can’t understand anymore the distinction between drift and skew (because Dataset, Covariate, and Concept shift are all types of distribution skew, or did I misundertand this?)
Regarding covariate shift and data drift, this is my understanding:
I understand that Covariate Shift happens when the distribution of the features changes between datasets (train, val, test). This can happen, for example, when the source of the datasets is different. Example: generating training data with one camera, and test data with another camera that produces images with different characteristics.
I understand that Data Drift happens when the distribution of the features changes over time. An example I use to explain this to myself is Consumer Behavior. I could train a model with data acquired at a certain point in time, and at that time the model predicts properly, but ,as time passes, Consumer Behavior changes so the ‘data drifts’ and the model will not be as good at predicting with the new distribution of the features.
What do you think?
Thanks a lot! Your reply is really helpful! I still have some doubts though:
To detect distribution skew, we compare train set and serving set, but if I understand correctly, the serving set is simply made of the queries received by the ML system in production, so how can we consider that static? I would definitely agree that we can apply the concept of distribution skew for train and test sets, which are typically fixed at training time.
Even Robert says that distribution manifests itself through dataset shift (which can be either covariate shift or concept shift) that is when “data has shifted over time”, so he mentions the time component in a concept which is previously defined as static. I see a bit of a contradiction here, and I would love to hear other people’s opinion about it.
But the difference between skew and drift is clearer to me now, thank you. To make it even more understandable, would you agree that:
- data drift is essentially covariate shift in time, and
- concept drift is essentially concept shift in time?
HI @Alessio_Molinari ,
Thank you for your reply and insights.
I would agree that data drift = concept drift (represented by a change in the relationship between the features and the labels), and that covariate shift is a particular case of data drift (represented by a change in the distribution between training features and new features).