Chroma vs FAISS vs...for vector store?

happyday · July 7, 2023, 11:57am

Until I know better, I’m staying away from cloud vector stores. I’ve been using FAISS, the course uses Chroma. I didn’t realize I could persist it! YAY!. Is one better than the other? Does it matter? Why pick one over the other? Thank you.

Juan_Olano · July 7, 2023, 1:42pm

Both are very good. I personally use Chroma, but if you are seeing expected results with FAISS, there’s no reason to change. The course uses Chroma probably because it is very well known.

thecsmbuilder · July 9, 2023, 11:28am

What are ways you can “debug” vector stores? I m getting inconsistent results from my csv data sets.

happyday · July 9, 2023, 1:18pm

@thecsmbuilder I am not very knowledgeable, so grain of salt. I am finding the debugging aspects of Langchain to be helpful. My intent with answering you is to see if these debugging methods can help.
See :

https://learn.deeplearning.ai/langchain-chat-with-your-data/lesson/6/question-answering starting at around 5:40
https://learn.deeplearning.ai/langchain/lesson/6/evaluation which is great.

Juan_Olano · July 9, 2023, 3:28pm

This is an interesting question. I have to start by saying that I don’t have an answer yet, but I want to learn more about your specific case. What are you seeing? why do you need to debug vector stores? What is the bug?

thecsmbuilder · July 10, 2023, 2:44pm

Thank you for your response. I will check this videos out.

thecsmbuilder · July 10, 2023, 2:47pm

im loading a csv file, and want to perform QnA over the data. How many items in X state, How many items created on this date, etc. The file contains 2k items. With varying status, dates, and so forth. When i run the prompt, and ask a question, I get a response , cannot find items in that state, and the chain returns records in another state. In looking at the data, 1900 items are in state 1, 100 in other state, and 100 in another state. States = Open, closed, Blocked, etc.

Juan_Olano · July 10, 2023, 2:54pm

It is important to build the vectors properly.

Lets say the columns in your csv are:

ItemId, ItemName, ItemDescription, CreationDate, ItemStatus

I would create a new column called “Data” as this:

Item Id: xxxx, Item Name: xxxxx, Description: xxxxx, Creation Date: xxxxx, Status: xxxxx

You would add this ‘Data’ column to your dataframe and then vectorize this new column. After that, you have to

Enter a question (Items with status Open)
Encode the question with the same encoder used to create the embeddings database
Find proximity
Pass the question and the resulting set from ‘proximity’ to an LLM (OpenAI’s GPT for example)
Get the LLM response

I would expect this to work better.

thecsmbuilder · July 10, 2023, 4:09pm

Thanks Juan, will give this a try and report back results.

thecsmbuilder · July 14, 2023, 8:22pm

Juan, QQ. I got a little lost on trying to make this work and follow these steps. Im making it too complicated. Do you have a workbook / notebook I can utilize?

Juan_Olano · July 14, 2023, 10:31pm

The ones I have with Chrome, I cannot share, but I do have a project that may help you:

Embeddings project

Hopefully it helps!

Topic		Replies	Views
ChromaDB issue in Vectorstores and Embedding LangChain: Chat with Your Data	7	1134	October 24, 2023
Vectorstore reindexing upon performing CRUD operations AI Discussions ai-discussions , langchain	0	96	May 7, 2024
ValueError Self query retriever with Vector Store type <class 'langchain_community.vectorstores.chroma.Chroma'> not supported LangChain: Chat with Your Data	3	730	May 26, 2024
Unable to load a saved Chroma Database LangChain: Chat with Your Data	11	2686	February 15, 2024
I've created vector embeddings using FAISS but now what? AI Discussions ai-discussions	1	206	May 5, 2024

Chroma vs FAISS vs...for vector store?

Related topics