Drift and skew difference

Hello, I am confused with the definition of drift and skew. The definition given in the lecture is attached as the first image.

However, the following slide shows an example of concept drift.

Since the slide shows difference between the training set and serving set, why it is not distribution skew ?

More importantly, in the first attached image, skew is defined as difference between two static versions like training and serving set. However, serving dataset is the latest version while training dataset is the old version. Why could we call them static version ? I think the difference between training and serving data should be named changes in data over time, which is the definition of drift.

Would you help me to solve my confusion ? Thanks

In case of skew, we don’t care about when data was collected. When it comes to drift, time plays a major role.

Regarding the graph:
Let’s say that serving data is more recent than training data with respect to when the data points were collected. If we observe the mapping from input features to output is different across both subsets due to when the data was collected, then the model is obsolete since the concept of input features to output label has changed. It is therefore important to perform model retraining with newer data.

It’s possible for serving dataset to be older than training data. This usually happens when a model is built using recent data and is used to fill historical records (serving dataset) assuming that distribution of training and serving data are similar.

Some groups create serving (test set) and training datasets from the same snapshot of the available data. Look at papers that measure the model performance on reference test sets to provide evidence that their approach is better than certain other approaches.

Hello, thanks for your clarification.

Could you also confirm if skew has the same meaning as shift ? For example, if covariate shift is same as covariate skew ?

Your understanding is correct (although I haven’t observed anyone call it covariate skew). I recommend sticking with the terms in the slides and checking with the team you work with for their preference. One way to use the terms is to call your observation a skew and categorize it under a particular shift. See this as well.

Thanks for your clarification !