Problem compiling the code - Implementing Retriever Functions in a RAG System - C1M2

Hello,

While grading the assignment “C1M2 - Implementing Retriever Functions in a RAG System“, the following error is obtained for all 3 exercises: ‘There was a problem compiling the code from your notebook, please check that you saved before submitting. Details: name ‘corpus’ is not defined“. Please find attached the screenshots.

“corpus” is defined as follows:

corpus = [x[‘title’] + " " + x[‘description’] for x in NEWS_DATA]

I did save before submitting. All unit tests are passing.

Can you please help to find the cause of this issue?

Hi @lintuyau ,

Try to do a clean run. From the menu bar at the top of your notebook:
kernel ->restart & clear all output
cell → run all

After checking there is no error, then submit your assignment.

Hi @Kic ,

I did what you proposed. There was no error after running all cells. All unit tests passed but I still got the same error.

I added the below print statement in the notebook and I was also able to print the value of “corpus“, which means that is it defined:

print(f"Corpus: {corpus[0]}")

Corpus: Harvey Weinstein's 2020 rape conviction overturned Victims group describes the New York appeal court's decision to retry Hollywood mogul as "profoundly unjust".

Hi @lintuyau ,

Just in case the auto-save is not activated. Could you repeat the same procedure for a clean run again , then click save & checkpoint before submitting for grading.

Hi @Kic ,

Sorry I forgot to mention, every time I did save the notebook before submitting for grading (I saw the message “Saving completed“ also). Same issue.

I there a way to share the notebook for verification?

Hi @lintuyau ,

I just want to be clear that I have not taken that course. But I am happy to have a look for you. Click on my icon and then message, attached a copy of your assignment.

To conclude this thread:

The error reported from the grader is due to debugging statements being left in one of the functions.

I’m having problems passing the first part of the programming assignment. Pretty sure I’ve got it right. Have checked the solution on ChatGPT and “it” claims my code should run OK. Get the following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[9], line 2
      1 # Output is a list of indices
----> 2 bm25_retrieve("What are the recent news about GDP?")

Cell In[8], line 26, in bm25_retrieve(query, top_k)
     22 tokenized_query = bm25s.tokenize([query])
     24 # Use the 'BM25_RETRIEVER' to retrieve documents and their scores based on the tokenized query
     25 # Retrieve the top 'k' documents
---> 26 results, scores = BM25_RETRIEVER.retrieve(tokenized_query, top_k)
     29 # Extract the first element from 'results' to get the list of retrieved documents
     30 result = results[0]

File /opt/conda/lib/python3.12/site-packages/bm25s/__init__.py:866, in BM25.retrieve(self, query_tokens, corpus, k, sorted, return_as, show_progress, leave_progress, n_threads, chunksize, backend_selection, weight_mask)
    864     else:
    865         index_flat = indices.flatten().tolist()
--> 866         results = [corpus[i] for i in index_flat]
    867         retrieved_docs = np.array(results).reshape(indices.shape)
    869 if return_as == "tuple":

TypeError: 'int' object is not subscriptable

Hi @bengtwalerud ,

At line 22, query is a str. Wrapping it with [ ] would turn query to an array of 1 element, which is causing the error.

Thanks for your prompt reply!

I had a very frustrating time resolving this problem but the solution was to alter the above code string into:

results, scores = BM25_RETRIEVER.retrieve(tokenized_query, k=top_k)

After that everything worked fine.

Making ‘query’ into ‘[query]’ was suggested by Chatgpt but the code works with or without it.

Seems the function bm25s is very finicky or also your Jupyter notebook environment is a bit unstable.

Had a second problem with the second sub-assignment. Threw an incomprehensible error but was resolved by deleting the function call and replacing it with an identical input which resolved the error.

I have passed the graded test now.

Bengt

Hi @bengtwalerud ,

Jupyter notebook cells should be run sequentially; otherwise, the environment loses synchronization as each cell depends on the current state of the kernel. Running cells out of sequence disrupts this state, leading to inconsistencies and error outputs. That is why you should always run your code from start every time you log back on to your account to resume your work. The same apply where the kernel has been idle for a while.

Do not put too much faith that a chat tool will know very much about the programming solutions for these courses.

That’s not a recommended strategy, you’re working against the design of the notebook itself.

Always run all of the cells from the top, every time you open the notebook. That’s how Jupyter notebooks are intended to be used.

They do not remember the state of your workspace when your session closes. They only save the text in the notebook - not its internal temporary data structures.

Posting any part of grade function codes that grades your assignment is gross violation of Code of conduct. If a mentor wants to look at your codes they will ask you to send them by private DM, until then only post screenshot of your error, failed test, of failed submission grader. Also as this is your first post comment in community, please take it as initial gentle reminder instead of posting your query on older thread, usually community guidelines instructed you to always create a new topic even if you find similar thread to your issue/query. You can always share the similar post link in your created post.

I used above code still getting error as ValueError: k of 5 is larger than the number of available scores, which is 1 (corpus size should be larger than top-k). Please set with a smaller k or increase the size of corpus.

hi @kharade.navin

i highly recommend you to create a new topic with the description i mentioned in your topic comment.

Don’t post codes on public post threads especially from grader function codes. You can post screenshot of your error, failed test or failed submission grader.

I am closing this thread to avoid confusion for topic creator as solutions and issue can match, but the code implementation can differ from one learner to another.

Regards

Dr. Deepti

Ok cool

@kharade.navin

please post new topics for your issue, i will respond on that.

Regards

Dr. Deepti