Assignment 4 (UNQ_C21) approximate_knn - Inconsistent data type for new_id (int vs. ndarray)

I’ve encountered an issue in the Week 4 assignment notebook within the approximate_knn function (Cell 126, UNQ_C21) that appears to stem from inconsistent test data.

When running the test cell (Cell 128, UNQ_C22), the approximate_knn function fails with an error related to the new_id variable.

To debug, I added print(f'{type(new_id)}') inside the loop. This showed that new_ids_to_consider (which comes from id_table[hash_value]) contains a mix of data types:

<class ‘int’>
<class ‘int’>

<class ‘numpy.ndarray’>

This data inconsistency causes the error:

ValueError: The error occurs at if doc_id == new_id:.

  1. ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
    
    

    This happens when doc_id (an int) is compared to a new_id that is a numpy.ndarray.

Is this data inconsistency in the id_tables intentional?

Thank you.

I did not encounter that problem in that assignment. My guess is that there is some other bug in your earlier logic that is causing that issue.

I’m rerunning that assignment, but the training takes quite a while. I’ll respond with my version of the instrumentation when it finishes.

Yes, when I added your instrumentation, all the values are type int:

So the error you show must be caused by something going wrong in your earlier logic.

Update: also note that you filed this under NLP Course 3, but it is NLP Course 1. I’ve edited the title using the little “edit pencil” to fix the categorization.

Thanks for the reply! Sorry for using wrong labels here.

Now that I know for sure it’s on my side, I’ll double check my work.

Great! Let us know how it goes. One thing to note is that they gave us most of the code here, especially the part that manages all the complex input data structures. A first thing to check is that you didn’t mess with any of that given logic. You can get a clean copy and compare that code.