One of the AI4Good considerations is to avoid storing or publishing PII, proprietary, private data or in general data that you don’t know those aspects explicitly.
Now, I am wondering what happens with vector encodings like text or visual embeddings. Imagine you train a model, you remove the data but you keep the model that somehow has memorized that data but it is not explicitly the data in their original form so it might no be identifiable. Can we store that model? Maybe as a base pre-trained model that we will later use for fine-tuning on a different dataset. Publicly sharing the model would not be an option by common-sense but storing it is ok? Imagine we use a cloud provider to train and save the checkpoints? I think there are many aspects to be checked if we need to follow the AI4Good default rules?
Somebody with experience that can provide some thoughts? Much appreciated.