chunk 0 contains the co author info. But RAG is not able to retrieve it, when I ask him about who are the co authors? (Used my own pdf)
I have created embeddings using text embedding-ada-002, use pinecone to save embeddings,
Using gpt3 turbo for prompting. this my retriever
qa_chain = RetrievalQA.from_chain_type(
llm,
retriever=vectorstore.as_retriever(),
return_source_documents=True,
chain_type_kwargs={“prompt”: QA_CHAIN_PROMPT}
)
How to solve this issue?
I have created chunk using unstructed chunk_by_tiltle
chunks = chunk_by_title(
pdf_elements,
combine_text_under_n_chars=100,
max_characters=3000,
)
here is a sample chunk item:
{‘type’: ‘CompositeElement’,
‘element_id’: ‘70f96630-9e05-4f04-80e0-eb04c02866bf’,
‘text’: ‘Capturing Collective Progress on Adaptation: A Proposal to move forward on the UNFCCC Global Stocktake\n\nUNITED NATIONS DEVELOPMENT PROGRAMME Capturing Collective Progress on Adaptation A Proposal to move forward on the UNFCCC Global Stocktake 1 2024 \n\nUNITED NATIONS DEVELOPMENT PROGRAMME Capturing Collective Progress on Adaptation A Proposal to move forward on the UNFCCC Global Stocktake 1 2024 \n\nLead Author: Joel B. Smith, Independent Researcher, USA.\n\nCo-Authors: Rohini Kohli, UNDP Prakash Bista, UNDP Patricia Velasco, UNDP\n\nContributing Authors: Myles Whittaker Olivia Diaz’,
‘metadata’: {‘file_directory’: ‘outputs/publications’,
‘filename’: ‘publication_0.pdf’,
‘filetype’: ‘application/pdf’,
‘languages’: [‘eng’],
‘last_modified’: ‘2024-11-07T00:10:34’,
‘page_number’: 1,
}