Retrieval issue

Abad100 · November 17, 2024, 12:06pm

chunk 0 contains the co author info. But RAG is not able to retrieve it, when I ask him about who are the co authors? (Used my own pdf)
I have created embeddings using text embedding-ada-002, use pinecone to save embeddings,
Using gpt3 turbo for prompting. this my retriever
qa_chain = RetrievalQA.from_chain_type(
llm,
retriever=vectorstore.as_retriever(),
return_source_documents=True,
chain_type_kwargs={“prompt”: QA_CHAIN_PROMPT}
)

How to solve this issue?
I have created chunk using unstructed chunk_by_tiltle
chunks = chunk_by_title(
pdf_elements,
combine_text_under_n_chars=100,
max_characters=3000,
)
here is a sample chunk item:
{‘type’: ‘CompositeElement’,
‘element_id’: ‘70f96630-9e05-4f04-80e0-eb04c02866bf’,
‘text’: ‘Capturing Collective Progress on Adaptation: A Proposal to move forward on the UNFCCC Global Stocktake\n\nUNITED NATIONS DEVELOPMENT PROGRAMME Capturing Collective Progress on Adaptation A Proposal to move forward on the UNFCCC Global Stocktake 1 2024 \n\nUNITED NATIONS DEVELOPMENT PROGRAMME Capturing Collective Progress on Adaptation A Proposal to move forward on the UNFCCC Global Stocktake 1 2024 \n\nLead Author: Joel B. Smith, Independent Researcher, USA.\n\nCo-Authors: Rohini Kohli, UNDP Prakash Bista, UNDP Patricia Velasco, UNDP\n\nContributing Authors: Myles Whittaker Olivia Diaz’,
‘metadata’: {‘file_directory’: ‘outputs/publications’,
‘filename’: ‘publication_0.pdf’,
‘filetype’: ‘application/pdf’,
‘languages’: [‘eng’],
‘last_modified’: ‘2024-11-07T00:10:34’,
‘page_number’: 1,
}

Deepti_Prasad · November 17, 2024, 6:13pm

in your qa chain prompt, include a SQL mentioning how you want data to be retrieved perse mentioning lead author and co author.

check in the metadata/utils.py file, where you would need to do this editing by yourself based on your data type and prompt design.

Abad100 · November 18, 2024, 11:40am

My metadata is in vectordb not in the sql db.

Deepti_Prasad · November 18, 2024, 12:52pm

@Abad100

I meant the prompt to be be given for author and co-author need to be done in metadata, so the output provides you both information.

There are GitHub repo where it states on how to do this.

please check there.

Regards
DP

Abad100 · November 18, 2024, 7:36pm

How to define retriever for this?

Topic		Replies	Views
? on using Metadata? LangChain for LLM Application Development	0	96	August 11, 2023
RetrievalQA not recognizing the files i am uploading LangChain: Chat with Your Data week-1	1	123	May 14, 2024
RetrievalQA does not identify correct context from document LangChain for LLM Application Development	0	150	July 30, 2023
RetrievalQA not recognizing the file names i am uploading LangChain: Chat with Your Data	0	146	May 14, 2024
L4 - RetrievalQA seems unresponsive LangChain for LLM Application Development	4	271	July 4, 2024

Retrieval issue

Related topics