Problem compiling the code - Implementing Retriever Functions in a RAG System - C1M2

Hello,

While grading the assignment “C1M2 - Implementing Retriever Functions in a RAG System“, the following error is obtained for all 3 exercises: ‘There was a problem compiling the code from your notebook, please check that you saved before submitting. Details: name ‘corpus’ is not defined“. Please find attached the screenshots.

“corpus” is defined as follows:

corpus = [x[‘title’] + " " + x[‘description’] for x in NEWS_DATA]

I did save before submitting. All unit tests are passing.

Can you please help to find the cause of this issue?

Hi @lintuyau ,

Try to do a clean run. From the menu bar at the top of your notebook:
kernel ->restart & clear all output
cell → run all

After checking there is no error, then submit your assignment.

Hi @Kic ,

I did what you proposed. There was no error after running all cells. All unit tests passed but I still got the same error.

I added the below print statement in the notebook and I was also able to print the value of “corpus“, which means that is it defined:

print(f"Corpus: {corpus[0]}")

Corpus: Harvey Weinstein's 2020 rape conviction overturned Victims group describes the New York appeal court's decision to retry Hollywood mogul as "profoundly unjust".

Hi @lintuyau ,

Just in case the auto-save is not activated. Could you repeat the same procedure for a clean run again , then click save & checkpoint before submitting for grading.

Hi @Kic ,

Sorry I forgot to mention, every time I did save the notebook before submitting for grading (I saw the message “Saving completed“ also). Same issue.

I there a way to share the notebook for verification?

Hi @lintuyau ,

I just want to be clear that I have not taken that course. But I am happy to have a look for you. Click on my icon and then message, attached a copy of your assignment.

To conclude this thread:

The error reported from the grader is due to debugging statements being left in one of the functions.

1 Like

I’m having problems passing the first part of the programming assignment. Pretty sure I’ve got it right. Have checked the solution on ChatGPT and “it” claims my code should run OK. Get the following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[9], line 2
      1 # Output is a list of indices
----> 2 bm25_retrieve("What are the recent news about GDP?")

Cell In[8], line 26, in bm25_retrieve(query, top_k)
     22 tokenized_query = bm25s.tokenize([query])
     24 # Use the 'BM25_RETRIEVER' to retrieve documents and their scores based on the tokenized query
     25 # Retrieve the top 'k' documents
---> 26 results, scores = BM25_RETRIEVER.retrieve(tokenized_query, top_k)
     29 # Extract the first element from 'results' to get the list of retrieved documents
     30 result = results[0]

File /opt/conda/lib/python3.12/site-packages/bm25s/__init__.py:866, in BM25.retrieve(self, query_tokens, corpus, k, sorted, return_as, show_progress, leave_progress, n_threads, chunksize, backend_selection, weight_mask)
    864     else:
    865         index_flat = indices.flatten().tolist()
--> 866         results = [corpus[i] for i in index_flat]
    867         retrieved_docs = np.array(results).reshape(indices.shape)
    869 if return_as == "tuple":

TypeError: 'int' object is not subscriptable

Hi @bengtwalerud ,

At line 22, query is a str. Wrapping it with [ ] would turn query to an array of 1 element, which is causing the error.

Thanks for your prompt reply!

I had a very frustrating time resolving this problem but the solution was to alter the above code string into:

results, scores = BM25_RETRIEVER.retrieve(tokenized_query, k=top_k)

After that everything worked fine.

Making ‘query’ into ‘[query]’ was suggested by Chatgpt but the code works with or without it.

Seems the function bm25s is very finicky or also your Jupyter notebook environment is a bit unstable.

Had a second problem with the second sub-assignment. Threw an incomprehensible error but was resolved by deleting the function call and replacing it with an identical input which resolved the error.

I have passed the graded test now.

Bengt

Hi @bengtwalerud ,

Jupyter notebook cells should be run sequentially; otherwise, the environment loses synchronization as each cell depends on the current state of the kernel. Running cells out of sequence disrupts this state, leading to inconsistencies and error outputs. That is why you should always run your code from start every time you log back on to your account to resume your work. The same apply where the kernel has been idle for a while.

Do not put too much faith that a chat tool will know very much about the programming solutions for these courses.

That’s not a recommended strategy, you’re working against the design of the notebook itself.

Always run all of the cells from the top, every time you open the notebook. That’s how Jupyter notebooks are intended to be used.

They do not remember the state of your workspace when your session closes. They only save the text in the notebook - not its internal temporary data structures.

1 Like