I’m having problems with the first excersise of the assignment.
My code:
GRADED CELL
edited as it contained solution block
The problem:
ValueError Traceback (most recent call last)
Cell In[41], line 2
1 # Output is a list of indices
----> 2 bm25_retrieve(“What are the recent news about GDP?”)
Cell In[40], line 29, in bm25_retrieve(query, top_k)
25 BM25_RETRIEVER.index(tokenized_query)
27 # Use the ‘BM25_RETRIEVER’ to retrieve documents and their scores based on the tokenized query
28 # Retrieve the top ‘k’ documents
—> 29 results, scores = BM25_RETRIEVER.retrieve(tokenized_query, k=top_k)
31 # Extract the first element from ‘results’ to get the list of retrieved documents
32 results = results[0]
File /opt/conda/lib/python3.12/site-packages/bm25s/init.py:696, in BM25.retrieve(self, query_tokens, corpus, k, sorted, return_as, show_progress, leave_progress, n_threads, chunksize, backend_selection, weight_mask)
694 num_docs = self.scores[“num_docs”]
695 if k > num_docs:
→ 696 raise ValueError(
697 f"k of {k} is larger than the number of available scores"
698 f", which is {num_docs} (corpus size should be larger than top-k)."
699 f" Please set with a smaller k or increase the size of corpus."
700 )
701 allowed_return_as = [“tuple”, “documents”]
703 if return_as not in allowed_return_as:
ValueError: k of 5 is larger than the number of available scores, which is 1 (corpus size should be larger than top-k). Please set with a smaller k or increase the size of corpus.