RAG and Data Privacy

Rahul36 · January 14, 2024, 6:52pm

hi All,
I have been reading up on RAG and how you can increase a model’s context by providing it your own proprietary material so that when asked questions it can take this new data into account and come up with more meaningful answers.

My question is that when you extend a Model’s capability by feeding it your own proprietary data through RAG tactics does that data get indexed into the Model’s database ? In other words do we lose confidentiality over the data that was used to build the context or is it ephemeral ?
I have been trying to understand this part but am confused. Any response addressing this would be appreciated.

YANG_FAN · January 15, 2024, 1:49am

That depends on your contract with the LLM provider.
There are different tiers:

Your are using ChatGPT for your individual use. Your conversations with GPT model will be used to improve the model, unless you opt out.
You are calling an API as part of the business offering from OpenAI (or could be other LLMs providers), there should be a contract to say your data won’t be used for training.
https://help.openai.com/en/articles/5722486-how-your-data-is-used-to-improve-model-performance
You are using open-source model, and hosting them either within your own virtual network or on-prem. Your data is safe.

Hope this is helpful.

rmalek · May 7, 2024, 8:14am

I was investigating this topic in depth, and found that OpenAI API is not using any inputs/output in model training and enhancement.

More information can be found here

Topic		Replies	Views
Data Risk using OpenAI API Building Systems with the ChatGPT API	4	117	September 21, 2023
Create a GPT with owned Data & RAG AI Discussions ai-discussions , project	6	387	July 7, 2024
GPT API Privacy AI Discussions	7	73	June 11, 2023
Data privacy AI Discussions	4	85	January 25, 2024
Privacy and Security ChatGPT Prompt Engineering for Developers	2	64	July 4, 2023

RAG and Data Privacy

Related topics