Retrieval issue

chunk 0 contains the co author info. But RAG is not able to retrieve it, when I ask him about who are the co authors? (Used my own pdf)
I have created embeddings using text embedding-ada-002, use pinecone to save embeddings,
Using gpt3 turbo for prompting. this my retriever
qa_chain = RetrievalQA.from_chain_type(
llm,
retriever=vectorstore.as_retriever(),
return_source_documents=True,
chain_type_kwargs={“prompt”: QA_CHAIN_PROMPT}
)

How to solve this issue?
I have created chunk using unstructed chunk_by_tiltle
chunks = chunk_by_title(
pdf_elements,
combine_text_under_n_chars=100,
max_characters=3000,
)
here is a sample chunk item:
{‘type’: ‘CompositeElement’,
‘element_id’: ‘70f96630-9e05-4f04-80e0-eb04c02866bf’,
‘text’: ‘Capturing Collective Progress on Adaptation: A Proposal to move forward on the UNFCCC Global Stocktake\n\nUNITED NATIONS DEVELOPMENT PROGRAMME Capturing Collective Progress on Adaptation A Proposal to move forward on the UNFCCC Global Stocktake 1 2024 \n\nUNITED NATIONS DEVELOPMENT PROGRAMME Capturing Collective Progress on Adaptation A Proposal to move forward on the UNFCCC Global Stocktake 1 2024 \n\nLead Author: Joel B. Smith, Independent Researcher, USA.\n\nCo-Authors: Rohini Kohli, UNDP Prakash Bista, UNDP Patricia Velasco, UNDP\n\nContributing Authors: Myles Whittaker Olivia Diaz’,
‘metadata’: {‘file_directory’: ‘outputs/publications’,
‘filename’: ‘publication_0.pdf’,
‘filetype’: ‘application/pdf’,
‘languages’: [‘eng’],
‘last_modified’: ‘2024-11-07T00:10:34’,
‘page_number’: 1,
}

in your qa chain prompt, include a SQL mentioning how you want data to be retrieved perse mentioning lead author and co author.

check in the metadata/utils.py file, where you would need to do this editing by yourself based on your data type and prompt design.

My metadata is in vectordb not in the sql db.

@Abad100

I meant the prompt to be be given for author and co-author need to be done in metadata, so the output provides you both information.

There are GitHub repo where it states on how to do this.

please check there.

Regards
DP

How to define retriever for this?