Data Centric AI for Distributed training

If you have a large dataset (TB of data) and want to train your model on it using DataCentric approach, how do you do it ?