C1M2 Exercise 1 - Returning more documents than available

andersrmr · July 24, 2025, 8:29pm

Following the example in section 3.2 BM25 Retrieve, I expected that the global variables would be instantiated and I would not need to recreate within the assignment function.

Why does:
line 29 results, scores = BM25_RETRIEVER.retrieve(tokenized_query, k=top_k)
give an error that k=5 is larger than available documents =1.

Why doesn’t BM25_RETRIEVER see the entire corpus as in the example, since the corpus has already been indexed based on the global variables?

lucas.coutinho · July 24, 2025, 8:57pm

Hi!

You are correct, you shouldn’t need to reindex it as BM25_RETRIEVER is already indexed previously. Would you mind sharing with me your entire solution via direct message?

Thanks,
Lucas

andersrmr · July 24, 2025, 10:40pm

The problem is the following line within the solution starter function:

# Index the tokenized chunks with the retriever
BM25_RETRIEVER.index(tokenized_query)

This step was performed in the previous cell on the TOKENIZED_DATA. Executing this cell overwrites the corpus information with the query information. This results in my error. Do I need to re-execute this step on the TOKENIZED_DATA? Or can I just skip it?

andersrmr · July 24, 2025, 11:42pm

I went ahead and used

BM25_RETRIEVER.index(TOKENIZED_DATA)

and it worked. Not sure if that line was necessary.

bforbesc · August 13, 2025, 11:46am

This worked for me as well. The comment “# Index the tokenized chunks with the retriever” should be removed from the assignment.

Topic		Replies	Views
BM25 Retrieve / Exercise 1 Retrieval Augmented Generation week-module-2	5	137	August 1, 2025
C1M2_Assignment Retrieval Augmented Generation week-module-2	1	106	July 28, 2025
C1M2 : Implementing Retriever Functions in a RAG System -Exercise 1 Retrieval Augmented Generation week-module-2	6	188	July 26, 2025
C1M2 Assignment - bm25_retrieve Retrieval Augmented Generation week-module-2	4	128	August 3, 2025
Coursera RAG course C1M2 Ex 1 and Ex 2 unit test errors Retrieval Augmented Generation week-module-2 , coursera-platform	3	53	August 19, 2025

C1M2 Exercise 1 - Returning more documents than available

Related topics