My code for C1M2 Excercise 2
def bm25_retrieve(query:str,top_k:int=5)
compiles or is interpreted correctly
and:
bm25_retrieve(“What are the recent news about GDP?”)
using the default top_k=5
gives the expected output:
“[752,673, 289, 626, 43]”
but unittesttest_bm25_retrieve(bm25_retrieve)
that has no top_k specified so should default to 5
gives:
“Failed test case: output has wrong length when using top_k=3”
And the same error bug in the unit test for Exercise 2 where my submissio again produces the expected_indices:
[743,673,626,752,326]
I tried to edit the provided unittests.py file with top_k’s of 5 and the expected_indices but this still errored. Although edits are annoyingly slow I edited this file yesterday so is the unit tester using “my” unittests.py? And how do I submit bugs to the RAG course authors?Please choose only one of the options in the tag (week/module) section.
Hey @jimstutt - Thanks for flagging this. I’ve looked into it and found that there’s a bug with the provided code in the notebook for Exercise 1, I’m going to flag this to the course staff to get it fixed. In the meantime, here’s how you can move forward with the assignment:
The bug is that since the retriever is already indexed globally in the setup-cell above:
BM25_RETRIEVER.index(TOKENIZED_DATA)
It’s not necessary to re-index inside the graded def bm25_retrieve() function. You can safely remove the following two lines:
# Index the tokenized chunks with the retriever
BM25_RETRIEVER.index(None)
Then proceed with the exercise by replacing the remaining None placeholders with your solutions. That should resolve the test failure and let you continue.
Let me know if that works for you!
Hi Lauren,
I did the suggested edit, vis
# BM25_RETRIEVER.index(TOKENIZED_DATA) in cell [15]
Now.after cell [17] bm25s_retreive(“What…“ giving the correct expected output as in my original post and having edited unittests.py from top_k=3 to top_k=5, again as in my original post
unittests.test.bm25s_retrieve(bm25s_retrieve) now returns 5 indices (is this done by my untitests.py edit?) but different values, vis:
“Failed test case: Incorrect output indices.
Expected: [752,653,289,626,43]
Got: [863.848,716,352,36]”
[“
Hey @jimstutt, thanks for the detailed followup, I suspect the unittest.py edits are the culprit here… To reset everything cleanly, I recommend refreshing your workspace entirely (you can find the instructions in M1 under “(Optional) Downloading your Notebook and Refreshing your Workspace.”) Just make sure to save your solution code locally so you can reapply it afterward. Once you’re back in a fresh environment, re-run the notebook (without re-indexing BM25_RETRIEVER inside bm25_retrieve as discussed earlier).
Hopefully that will resolve the mismatch, let me know how it goes!