Data Skew - joint, marginal and conditional probability?

Hi all!

I was doing just find with the course until I saw this image on lecture regarding the types of skew and their probabilistic analogies below:

I understand what joint, marginal and conditional probability refer to in general but didn’t quite follow how these concepts relate to the training and serving data.

Does anyone have an explanation or examples to help describe what is going on in this slide?

Thanks in advance :smiley:

Training data refers to the dataset you use to build your model. Usually, this is historical data that’s applicable to your problem.
Serving data refers to what your encounters when deployed. This is the dataset you want the trained model to do well on. Serving could be as simple as invoking your model via an http call for prediction.

With this in mind, please watch the examples in this lecture.