C1M2 Exercise 1 - Returning more documents than available

Following the example in section 3.2 BM25 Retrieve, I expected that the global variables would be instantiated and I would not need to recreate within the assignment function.

Why does:
line 29 results, scores = BM25_RETRIEVER.retrieve(tokenized_query, k=top_k)
give an error that k=5 is larger than available documents =1.

Why doesn’t BM25_RETRIEVER see the entire corpus as in the example, since the corpus has already been indexed based on the global variables?

Hi!

You are correct, you shouldn’t need to reindex it as BM25_RETRIEVER is already indexed previously. Would you mind sharing with me your entire solution via direct message?

Thanks,
Lucas

The problem is the following line within the solution starter function:

# Index the tokenized chunks with the retriever
BM25_RETRIEVER.index(tokenized_query)

This step was performed in the previous cell on the TOKENIZED_DATA. Executing this cell overwrites the corpus information with the query information. This results in my error. Do I need to re-execute this step on the TOKENIZED_DATA? Or can I just skip it?

I went ahead and used

BM25_RETRIEVER.index(TOKENIZED_DATA)

and it worked. Not sure if that line was necessary.

1 Like

This worked for me as well. The comment “# Index the tokenized chunks with the retriever” should be removed from the assignment.