NLP C1_W4_Assignment UNQ_C21 bug?

Comment tells to remove an element:

        # remove the id of the document that we're searching
        if doc_id in new_ids_to_consider:
            print(f"removed doc_id {doc_id} of input vector from new_ids_to_search")

I believe that this is wrong. If we change new_ids_to_consider without changing document_vectors_l then result would be incorrect. Elements in a hash_table are lined up with elements in corresponding id_table. Removing element breaks alignment.

Also, removing elements from either new_ids_to_consider or document_vectors_l changes planes_l structure globally, so function becomes not re-entrant without rebuilding planes_l each time.

Document that we search neighbors of should be skipped, for example:

        # loop through the subset of document vectors to consider
        for i, new_id in enumerate(new_ids_to_consider):
            if doc_id == new_id:
                continue

Hi ruzikarov,

Your argument makes sense to me. I’ll report this to people working on the backend. Thanks!

@ruzakirov What you proposed is another way of implementing it. In both methods you are just avoiding processing for doc_id == new_id, whether you accomplish it with if-then or removing it from the list altogether.

@Shantimohan_Elchuri, you’re wrong. It’s a fact that two are not equivalents. Results are different. Let me explain differently. For example new_ids_to_consider = [1, 2, 3], document_vectors_l = [v1, v2, v3] and doc_id = 1. After deletion new_ids_to_consider = [2, 3], but document_vectors_l = [v1, v2, v3] still. Later in the code we do the following:

        for i, new_id in enumerate(new_ids_to_consider):
                document_vector_at_i = document_vectors_l[i]
                vecs_to_consider_l.append(document_vector_at_i)
                ids_to_consider_l.append(new_id)

On first iteration i = 0, new_id = 2 and document_vector_at_i = v1. I hope you see mistake.

Hi,
I’m also confused when seeing the instruction ask use to remove doc_id from new_ids_to_consider. As @ruzakirov said, if we do so, that would globally change the list in the hash table, and make document_vectors_l’s index out of sync. I don’t think that is correct either.
I don’t see any follow up after the #4 post above in Nov 21. But in Marh 22, I still see the lines asking to remove doc_id from new_id_to_consider. Can the mentors clarify whether we should do that?

Thanks.

So is it because sometimes things do not work well on the backend, that the grader conflicts with the unittests and expectedoutputs?

Can you ask them to verify that the grader for hash_value_of_vector is working properly? My expected output and unititests are fine, but grader gives me 0/10 for that exercise.

Hi David_Simmonds,

This is not necessarily the case, as the unittests and the expected outputs do not test everything. So the grader may catch a bug in your code that the unittests do not catch.

The other thing to be careful about here is something that David and I have been discussing in a DM thread:

If you are working in a renamed copy of the notebook, hitting “Submit” does not grade your current notebook: it grades the “standard one”, meaning the one that is opened by the “Work in Browser” link. I would totally agree if you claimed that this is an egregious violation of the vaunted “Principle of Least Astonishment” from the UX world. But that’s the way it works, so it’s important to understand that.

So if the grader results don’t make sense, it could be that the grader is looking at different code than you are.

Yes Reinoud,

As Paul said, if you’re using a renamed version of the assignment, it still expects the original file name. It may seem obvious that the grader is expecting a particular file name. But it seems equally obvious that that whatever file is loaded in memory is the file which would be submitted to the grader.

In any case, the grader is now doing the opposite for my assignment. Giving me a 10/10 for make_hash_tablen that I did not earn, even though the unit test failed and the output is different from the expected output.

I strongly suggest that someone goes through all the assignments to ensure that the answer key gets a perfect grade from the grader. Grading the grader if you will. And also check the notebooks and grader regularly to ensure that the expected outputs are in sync (to however many decimal places) with newer versions of python, and that the code supplied in the notebook is not deprecated.

David

Hi David,

I did not know this issue existed. Definitely something to look into; good that you have been discussing this with Paul.

Thanks!

Yes, Paul has really taken this by the horns.