RAG and Data Privacy

hi All,
I have been reading up on RAG and how you can increase a model’s context by providing it your own proprietary material so that when asked questions it can take this new data into account and come up with more meaningful answers.

My question is that when you extend a Model’s capability by feeding it your own proprietary data through RAG tactics does that data get indexed into the Model’s database ? In other words do we lose confidentiality over the data that was used to build the context or is it ephemeral ?
I have been trying to understand this part but am confused. Any response addressing this would be appreciated.

That depends on your contract with the LLM provider.
There are different tiers:

  1. Your are using ChatGPT for your individual use. Your conversations with GPT model will be used to improve the model, unless you opt out.
  2. You are calling an API as part of the business offering from OpenAI (or could be other LLMs providers), there should be a contract to say your data won’t be used for training.
    https://help.openai.com/en/articles/5722486-how-your-data-is-used-to-improve-model-performance
  3. You are using open-source model, and hosting them either within your own virtual network or on-prem. Your data is safe.

Hope this is helpful.