Hi,
in lesson 5 of the Langchain: Chat with your data course, they are first showing us basic MMR seach, the SelfQueryRetriever, where we can filter for metadata and later on the ContextualCompressionRetriever.
For MMR I can simply pass a filter dictionary to only search in those Documents that are in page 2. For the SelfQueryRetriever, I pass a metadata_info List of AttributeInfo-Objects, to ensure the query knows how to filter.
Yet, for ContextualCompressionRetriever I can’t seem to find a way to include those filters. I want to be able to, for example, look only on page 2 or a specific document of those files.
In the course, they do it like this:
Basic MMR:
docs = vectordb.similarity_search(
question,
k=3,
filter={"source":"docs/cs229_lectures/MachineLearning-Lecture03.pdf"}
)
SelfQueryRetriever
metadata_field_info = [
AttributeInfo(
name="source",
description="The lecture the chunk is from, should be one of `docs/cs229_lectures/MachineLearning-Lecture01.pdf`, `docs/cs229_lectures/MachineLearning-Lecture02.pdf`, or `docs/cs229_lectures/MachineLearning-Lecture03.pdf`",
type="string",
),
AttributeInfo(
name="page",
description="The page from the lecture",
type="integer",
),
]
document_content_description = "Lecture notes"
llm = OpenAI(temperature=0)
retriever = SelfQueryRetriever.from_llm(
llm,
vectordb,
document_content_description,
metadata_field_info,
verbose=True
)
and later like this
ContextualCompressionRetriever:
# Wrap our vectorstore
llm = OpenAI(temperature=0)
compressor = LLMChainExtractor.from_llm(llm)
compression_retriever = ContextualCompressionRetriever(
base_compressor=compressor,
base_retriever=vectordb.as_retriever(search_type = "mmr")
)
How do I add metadata_field_info and a filter to the ContextualCompressionRetriever?
Initializing it with metadata_field_info = metadata_field_info doesn’t seem to work.
compression_retriever = ContextualCompressionRetriever(
base_compressor=compressor,
base_retriever=vectordb.as_retriever(
search_type="mmr", metadata_field_info=metadata_field_info
),
)