I recently completed a project that involved using the FAISS vector database. I utilized lang-chain for storing embeddings in the vector database, which were generated from PDF files. For the purpose of the project, it was sufficient to store all the information without separating the storage according to users.
What I want to know is - when a user uploads a PDF, can I create an embedding for it and store it in the vector database, allowing me to query the embeddings for that user later on. This ensures that the generated output is accurate and privacy is also maintained. I was wondering, can I do that? If so, how?
I really appreciate any help!
Vector stores have 2 pieces of information for every entry:
- Embedding (generated from content)
- Metadata (set by the developer)
Metadata can contain information such as
document_name and other such information. This is something you set either at the document level or at the store level, depending on the support from the underlying vector database.
At query time, your implementation should first add filters to include documents that match metadata criteria. Only the subset of documents that satisfy the filter criteria should be used to respond to the query based on vector similarity.
Hey! Thank you so much. I will use this method. Once again thank you for your time.