C2W2 lecture unclear instructions

This is regarding Machine Learning Data Lifecycle in Production > Week 2 > Preprocessing Data at Scale. I appreciate elaboration and examples on the following points:

  1. The pros and cons between Pre-processing training dataset vs Transform within the model. Some terms are used vaguely and do not help prepare my understanding for the quiz.
  2. Please clarify “Instance-level” transformation-- Is there any transformation that does not touch the instances?

Thanks.

  • Regarding your second point, by “instance-level” transformation, what the instructor means is any transformation that doesn’t need to go through the entire dataset to apply the transformation. For example, taking the square of the feature.
  • “full-pass” transformations on the other hand require to see the entire dataset once to evaluate the relevant statistics before the transformation can be applied. For example, to apply standard scaling, you have to see the entire dataset to evaluate mean and standard deviation before the transformation can be applied.