Try filtering complex metadata from the document using langchain_community.vectorstores.utils.filter_complex_metadata

Hi I am running this code from the provided notebook.

documents =
for element in elements:
metadata = element.metadata.to_dict()
del metadata[“languages”]
metadata[“source”] = metadata[“filename”]
documents.append(Document(page_content=element.text, metadata=metadata))

embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(documents, embeddings)

Running the vectorstore cell gives out the following error,
I am unsure what will be the accurate solution to it.

Error:
ValueError: Expected metadata value to be a str, int, float or bool, got [{‘x’: 0, ‘y’: 0, ‘w’: 1, ‘h’: 1, ‘content’: ‘NAVER CLOVA’}, {‘x’: 1, ‘y’: 0, ‘w’: 1, ‘h’: 1, ‘content’: ‘2NAVER Search’}, {‘x’: 2, ‘y’: 0, ‘w’: 1, ‘h’: 1, ‘content’: ‘3SNAVER AI Lal’}] which is a <class ‘list’>

Try filtering complex metadata from the document using langchain_community.vectorstores.utils.filter_complex_metadata.

Are you really using all the necessary files for your metadata yo be as provided by the course as the course mentions issue being on how you recalled your metadata.

I ran into the same issue, which I solved like this:

First, import the function mentioned in the error:

from langchain_community.vectorstores.utils import filter_complex_metadata

Then change this line:

vectorstore = Chroma.from_documents(documents, embeddings)

to this:

vectorstore = Chroma.from_documents(filter_complex_metadata(documents), embeddings)
1 Like

thanks for sharing on how to wrap this up with “filter_complex_metadata”.
But now how to use this as filter option in vector_store.as_retriever(‘filter’: {‘source_name’:ABC}}).
Source_name is a list here.

Thanks! This worked for me using the hosted notebook.

1 Like