Chroma vs FAISS vs...for vector store?

Until I know better, I’m staying away from cloud vector stores. I’ve been using FAISS, the course uses Chroma. I didn’t realize I could persist it! YAY!. Is one better than the other? Does it matter? Why pick one over the other? Thank you.

Both are very good. I personally use Chroma, but if you are seeing expected results with FAISS, there’s no reason to change. The course uses Chroma probably because it is very well known.

What are ways you can “debug” vector stores? I m getting inconsistent results from my csv data sets.

@thecsmbuilder I am not very knowledgeable, so grain of salt. I am finding the debugging aspects of Langchain to be helpful. My intent with answering you is to see if these debugging methods can help.
See :

2 Likes

This is an interesting question. I have to start by saying that I don’t have an answer yet, but I want to learn more about your specific case. What are you seeing? why do you need to debug vector stores? What is the bug?

1 Like

Thank you for your response. I will check this videos out.

im loading a csv file, and want to perform QnA over the data. How many items in X state, How many items created on this date, etc. The file contains 2k items. With varying status, dates, and so forth. When i run the prompt, and ask a question, I get a response , cannot find items in that state, and the chain returns records in another state. In looking at the data, 1900 items are in state 1, 100 in other state, and 100 in another state. States = Open, closed, Blocked, etc.

It is important to build the vectors properly.

Lets say the columns in your csv are:

ItemId, ItemName, ItemDescription, CreationDate, ItemStatus

I would create a new column called “Data” as this:

Item Id: xxxx, Item Name: xxxxx, Description: xxxxx, Creation Date: xxxxx, Status: xxxxx

You would add this ‘Data’ column to your dataframe and then vectorize this new column. After that, you have to

  1. Enter a question (Items with status Open)
  2. Encode the question with the same encoder used to create the embeddings database
  3. Find proximity
  4. Pass the question and the resulting set from ‘proximity’ to an LLM (OpenAI’s GPT for example)
  5. Get the LLM response

I would expect this to work better.

1 Like

Thanks Juan, will give this a try and report back results.

Juan, QQ. I got a little lost on trying to make this work and follow these steps. Im making it too complicated. Do you have a workbook / notebook I can utilize?

The ones I have with Chrome, I cannot share, but I do have a project that may help you:

Embeddings project

Hopefully it helps!

1 Like