Copilot says:
You’re right to think carefully about this — in a standard RAG pipeline, proprietary documents are retrieved and appended to the user prompt before being sent to an LLM. If the LLM is not self‑hosted, then yes: the provider may log prompts for safety, evaluation, or system improvement unless you’re using a deployment mode that explicitly disables it.
There is active research and guidance on this topic. A few useful references:
1. Privacy risks in RAG systems
A recent academic survey highlights how RAG introduces new privacy concerns beyond normal LLM usage, including leakage through retrieved documents and access patterns.
2. Private RAG architectures
There is also work on “Private RAG,” which uses trusted execution environments (TEEs) so that the server cannot see the user query, retrieved documents, or access patterns.
3. Enterprise RAG whitepapers
Industry whitepapers (e.g., Intel’s RAG whitepaper) discuss how companies can deploy RAG pipelines on their own infrastructure to avoid sending proprietary data to external LLMs.
What this means in practice
If your company handles PII or regulated data, you generally have three options:
Option A — Use a self‑hosted or VPC‑isolated LLM This ensures prompts never leave your environment. Many enterprises choose this for compliance reasons.
Option B — Use a provider’s “no‑logging” or “zero‑retention” mode Most major LLM vendors now offer enterprise tiers where prompts are not stored, logged, or used for training.
Option C — Use privacy‑preserving RAG designs Techniques like TEEs, encrypted retrieval, or on‑prem vector stores can ensure that only the generation step touches the model, and even that can be privacy‑protected depending on deployment.
Bottom line
RAG itself is not the privacy risk — where you send the final augmented prompt is. If you’re using a cloud LLM, you must verify the provider’s retention and logging policies. If you’re using a self‑hosted model, you control the entire pipeline.
The papers below are a good starting point if you want to dive deeper into the formal privacy analysis of RAG systems.
Clean Citation Block
Academic Research on RAG Privacy Risks
The Good and The Bad: Exploring Privacy Issues in Retrieval‑Augmented Generation (RAG). Findings of ACL 2024.
Risk Assessment & Mitigation Frameworks
Securing RAG: A Risk Assessment and Mitigation Framework. arXiv preprint (2025).
Industry Whitepapers on Enterprise‑Grade RAG
Intel. LangChain Retrieval‑Augmented Generation White Paper.
Additional Security‑Focused RAG Research
Advancing Privacy and Security in Generative AI‑Driven RAG Architectures: A Next‑Generation Framework. International Journal of Artificial Intelligence & Applications.