Create Pinecone index with re-ranking and auto-merging


I’m following the course and I’m able to create the local index with auto-merging and re-ranking following lesson 5. I can later create the query_engine() and proceed as usual.

My problem comes when I try to create the index not locally but with Pinecone. I have already created other Pinecone index in the past, but never with the auto-merging and re-ranking capabilities.

My code is as follows:

# Create Pinecone index
index_name = 'llama-index-pinecone-test'
if index_name not in pinecone.list_indexes():
    pinecone.create_index(index_name, dimension=384, metric='cosine')
pinecone_index = pinecone.Index(index_name)
llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)

# Loads documents with LlamaIndex
path = 'path_to_files'
documents = SimpleDirectoryReader(path).load_data()

# Custom Hierarchical Node Parsing
node_parser = HierarchicalNodeParser.from_defaults(chunk_sizes=[2048, 512, 128])
nodes = node_parser.get_nodes_from_documents(documents)
leaf_nodes = get_leaf_nodes(nodes)

# Service and Storage Context Setup
service_context = ServiceContext.from_defaults(llm=llm, embed_model="local:BAAI/bge-small-en-v1.5")
storage_context = StorageContext.from_defaults(vector_store=PineconeVectorStore(pinecone_index=pinecone_index))

# Add documents to Storage
automerging_index = VectorStoreIndex(leaf_nodes, storage_context=storage_context, service_context=service_context)

That seems to work and I can see my vectors in the Pinecone console. The problem is that when I try to query the index like this:

# Create Auto Merging Query Engine
base_retriever = automerging_index.as_retriever(similarity_top_k=12)
retriever = AutoMergingRetriever(base_retriever, automerging_index.storage_context, verbose=True)
rerank = SentenceTransformerRerank(top_n=6, model="BAAI/bge-reranker-base")
query_engine = RetrieverQueryEngine.from_args(retriever, node_postprocessors=[rerank])
res = query_engine.query("some query")

I get an error like this: doc_id 58e22eba-5067-434c-88e5-8818b433e6c0 not found.

I can see that document indexed in my Pinecone console, so I guess that is does exist and for some reason my query engine is not finding it.

I have tried re-indexing several times, different documents and different configurations, but I can’t solve the problem. Any help would be much appreciated.