RAG Module 2 bm25_retrieve graded exercice error

jdelfosse · October 26, 2025, 6:29pm

Hi,

This exercice is confusing.

Hint2 is

“Make sure the corpus is indexed. This can be done by preparing the retriever with the document data before performing retrieval. Use the .index method of BM25_RETRIEVER.”

However, it was done two cells before (in the example, and is also done one cell before the graded cell).

I ran the cell before the graded cell and was able to print TOKENIZED_DATA

The tokenized query result in the graded cell is

Tokenized(
  "ids": [
    0: [0, 1, 2, 3, 4]
  ],
  "vocab": [
    'about': 3
    'gdp': 4
    'news': 2
    'recent': 1
    'what': 0
  ],
)

Then, when I call BM25_RETRIEVER.retrieve() using the tokenized query and top_k, I got the following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[14], line 2
      1 # Output is a list of indices
----> 2 bm25_retrieve("What are the recent news about GDP?")

Cell In[13], line 28, in bm25_retrieve(query, top_k)
     24 print(TOKENIZED_DATA)
     26 # Use the 'BM25_RETRIEVER' to retrieve documents and their scores based on the tokenized query
     27 # Retrieve the top 'k' documents
---> 28 results, scores = BM25_RETRIEVER.retrieve(tokenized_query, top_k)
     29 print("B")
     31 # Extract the first element from 'results' to get the list of retrieved documents

File /usr/local/lib/python3.11/site-packages/bm25s/__init__.py:866, in BM25.retrieve(self, query_tokens, corpus, k, sorted, return_as, show_progress, leave_progress, n_threads, chunksize, backend_selection, weight_mask)
    864     else:
    865         index_flat = indices.flatten().tolist()
--> 866         results = [corpus[i] for i in index_flat]
    867         retrieved_docs = np.array(results).reshape(indices.shape)
    869 if return_as == "tuple":

File /usr/local/lib/python3.11/site-packages/bm25s/__init__.py:866, in <listcomp>(.0)
    864     else:
    865         index_flat = indices.flatten().tolist()
--> 866         results = [corpus[i] for i in index_flat]
    867         retrieved_docs = np.array(results).reshape(indices.shape)
    869 if return_as == "tuple":

TypeError: 'int' object is not subscriptable

lukmanaj · October 27, 2025, 12:53am

I think the major issue is how you called the retrieve method. The top_k value should actually be assigned to a keyword argument k.
Check out the documentation here.

Here’s an example given:

import bm25s

# Create your corpus here
corpus = [
    "a cat is a feline and likes to purr",
    "a dog is the human's best friend and loves to play",
    "a bird is a beautiful animal that can fly",
]

# Tokenize the corpus and index it
corpus_tokens = bm25s.tokenize(corpus)
retriever = bm25s.BM25(corpus=corpus)
retriever.index(corpus_tokens)

# You can now search the corpus with a query
query = "does the fish purr like a cat?"
query_tokens = bm25s.tokenize(query)
docs, scores = retriever.retrieve(query_tokens, k=2) ## check this line, you should assign to k
print(f"Best result (score: {scores[0, 0]:.2f}): {docs[0, 0]}")

# Happy with your index? Save it for later...
retriever.save("bm25s_index_animals")

# ...and load it when needed
ret_loaded = bm25s.BM25.load("bm25s_index_animals", load_corpus=True)

lukmanaj · October 27, 2025, 12:54am

Also check this post:

jdelfosse · October 27, 2025, 10:15am

many thanks, I corrected the syntax and was able to complete module 2

Topic		Replies	Views
C1M2 Assignment - bm25_retrieve Retrieval Augmented Generation week-module-2	4	204	August 3, 2025
C1M2 : Implementing Retriever Functions in a RAG System -Exercise 1 Retrieval Augmented Generation week-module-2	10	317	October 14, 2025
C1M2 Exercise 1 - Returning more documents than available Retrieval Augmented Generation week-module-2	7	172	September 1, 2025
Error M2 Ex1 Retrieval Augmented Generation week-module-2 , coursera-platform	4	63	October 29, 2025
C1M2_Assignment Retrieval Augmented Generation week-module-2	1	180	July 28, 2025

RAG Module 2 bm25_retrieve graded exercice error

Related topics