While grading the assignment “C1M2 - Implementing Retriever Functions in a RAG System“, the following error is obtained for all 3 exercises: ‘There was a problem compiling the code from your notebook, please check that you saved before submitting. Details: name ‘corpus’ is not defined“. Please find attached the screenshots.
“corpus” is defined as follows:
corpus = [x[‘title’] + " " + x[‘description’] for x in NEWS_DATA]
I did save before submitting. All unit tests are passing.
Can you please help to find the cause of this issue?
I did what you proposed. There was no error after running all cells. All unit tests passed but I still got the same error.
I added the below print statement in the notebook and I was also able to print the value of “corpus“, which means that is it defined:
print(f"Corpus: {corpus[0]}")
Corpus: Harvey Weinstein's 2020 rape conviction overturned Victims group describes the New York appeal court's decision to retry Hollywood mogul as "profoundly unjust".
Just in case the auto-save is not activated. Could you repeat the same procedure for a clean run again , then click save & checkpoint before submitting for grading.
I just want to be clear that I have not taken that course. But I am happy to have a look for you. Click on my icon and then message, attached a copy of your assignment.
I’m having problems passing the first part of the programming assignment. Pretty sure I’ve got it right. Have checked the solution on ChatGPT and “it” claims my code should run OK. Get the following error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[9], line 2
1 # Output is a list of indices
----> 2 bm25_retrieve("What are the recent news about GDP?")
Cell In[8], line 26, in bm25_retrieve(query, top_k)
22 tokenized_query = bm25s.tokenize([query])
24 # Use the 'BM25_RETRIEVER' to retrieve documents and their scores based on the tokenized query
25 # Retrieve the top 'k' documents
---> 26 results, scores = BM25_RETRIEVER.retrieve(tokenized_query, top_k)
29 # Extract the first element from 'results' to get the list of retrieved documents
30 result = results[0]
File /opt/conda/lib/python3.12/site-packages/bm25s/__init__.py:866, in BM25.retrieve(self, query_tokens, corpus, k, sorted, return_as, show_progress, leave_progress, n_threads, chunksize, backend_selection, weight_mask)
864 else:
865 index_flat = indices.flatten().tolist()
--> 866 results = [corpus[i] for i in index_flat]
867 retrieved_docs = np.array(results).reshape(indices.shape)
869 if return_as == "tuple":
TypeError: 'int' object is not subscriptable
Making ‘query’ into ‘[query]’ was suggested by Chatgpt but the code works with or without it.
Seems the function bm25s is very finicky or also your Jupyter notebook environment is a bit unstable.
Had a second problem with the second sub-assignment. Threw an incomprehensible error but was resolved by deleting the function call and replacing it with an identical input which resolved the error.
Jupyter notebook cells should be run sequentially; otherwise, the environment loses synchronization as each cell depends on the current state of the kernel. Running cells out of sequence disrupts this state, leading to inconsistencies and error outputs. That is why you should always run your code from start every time you log back on to your account to resume your work. The same apply where the kernel has been idle for a while.
That’s not a recommended strategy, you’re working against the design of the notebook itself.
Always run all of the cells from the top, every time you open the notebook. That’s how Jupyter notebooks are intended to be used.
They do not remember the state of your workspace when your session closes. They only save the text in the notebook - not its internal temporary data structures.
Posting any part of grade function codes that grades your assignment is gross violation of Code of conduct. If a mentor wants to look at your codes they will ask you to send them by private DM, until then only post screenshot of your error, failed test, of failed submission grader. Also as this is your first post comment in community, please take it as initial gentle reminder instead of posting your query on older thread, usually community guidelines instructed you to always create a new topic even if you find similar thread to your issue/query. You can always share the similar post link in your created post.
I used above code still getting error as ValueError: k of 5 is larger than the number of available scores, which is 1 (corpus size should be larger than top-k). Please set with a smaller k or increase the size of corpus.
i highly recommend you to create a new topic with the description i mentioned in your topic comment.
Don’t post codes on public post threads especially from grader function codes. You can post screenshot of your error, failed test or failed submission grader.
I am closing this thread to avoid confusion for topic creator as solutions and issue can match, but the code implementation can differ from one learner to another.