Different embedding models?

By default, all examples use openai’s embeddings. These cost us. Is the quality of their embeddings “better” for retrieval? If we use a different one and use an openai LLM, will the results be worse? Thank you.

To create your embeddings database you can use any other embedding library. As long as you use the same embedding method to encode, search proximity, and decode, it will work.

So feel free to experiment with another embedding library, replace the one that comes in the labs, and see the results.

Lets say you are building a system that answers questions from a document.

The process is the same:

  1. Take your original data
  2. Create embeddings with it.

Then:
3. Take user input, for example a question.
4. Encode it using the same embedding library
5. Search matches (cosine similarity is very common)
6. Gather the results
7. Decode back to text

Then:
8. Build your prompt with the decoded text plus the user question plus any additional information
9. Call the LLM
10. Get results.

Please try it and share your findings!

NOTE: If you are going to create a solution and need to scale it, you may want to go for a quality platform. Pinecone is a very good one, for example.

1 Like