NLP C1_W1_Assignment : Frequency Dictionary missing a few entries?

  • Week#1

  • NLP C1_W1_Assignment:

  • Main Issue :
    In the NLP C1_W1_Assignment, in the “Prepare the Data” Section, I get a mismatch as follows: (and as highlighted in the screenshot)

len(freqs) = 11406
does not match the Expected output:
len(freqs) = 11436

Please note- this is before any of the exercises or any additional changes on my part. I followed the instructions to request a fresh copy to be sure.

I was hoping it was a typo so I went ahead.

I passed all the tests until exercise 4, passing 11 tests but failing 5.
Also in Section 3 - Training your Model, my cost and trained weights were always off by a little:
My cost after training is 0.22525459.
My resulting vector of weights is [6e-08, 0.00053785, -0.00055884]

Expected Output:
The cost after training is 0.22522315.
The resulting vector of weights is [6e-08, 0.00053818, -0.0005583]

I’d really appreciate it if someone could clarify the first issue of the missing frequency dictionary entries. :pray:

Yes, I just reran my notebook that used to pass all the tests and the grader and I get the same error that you show. In addition to that mismatching expected length, several of the unit tests in the notebook now fail for me.

This must be another side effect of the nltk data being changed. Here’s recent thread about similar problems in the NLP C1 W2 assignment. Mentor @Deepti_Prasad reported that other problem to the course staff, but we have not yet heard back from them.

Deepti, is it possible to add a note to your previous report suggesting that they scan all the NLP courses to see which assignments depend on imported NLTK data?

2 Likes

hi @paulinpaloalto

I will make sure to add this week too.

nltkdata issue actually started with course3 week1 assignment which was addressed few months ago. I will make sure report this assignment and in general the nltkdata version(it probably needs to get updated or be used with the version, the assignment was created.

@lkj thank you for reporting this.

Regards
DP

1 Like

thanks @paulinpaloalto @Deepti_Prasad for the quick response.

Hi, Deepti.

Thanks very much for working with the course staff to get this fixed. I’m not familiar with the nltk website, but maybe we should encourage them to find a “permanent” solution to this issue. Meaning a way that the assignments can ask for a particular version of the data or if the data cannot be guaranteed to be stable, then they just need to import the copies to make them fixed.

Regards,
Paul

1 Like

hi @paulinpaloalto

i suggested the same solution about version about nltk data when I have informed the l.t. of course :slightly_smiling_face:

Regards
DP

1 Like

hi @lkj

as per recent update by @lucas.coutinho, the correction with the process tweet has been addressed with the stopword metadata correction. please close and open the lab to see the changes done.

Regards
DP

1 Like

Hi @Deepti_Prasad
I ran the updated lab but still get the mismatch (see screenshot below)
Still try logging out and getting a fresh lab? Or is my output now the actual “expected output”?

1 Like

@lucas.coutinho

can you please check week1 assignment too again once.

Regards
DP

Try refreshing your classroom page and then open the lab, does it still give the same issue?

yes.
(just to confirm: I did indeed get the pop up that said unittest had been changed when I opened the updated file this morning)

I ended up patching the notebook update as a silent update, so if you want to get the updated copy of it, you may refresh your workspace as usual to get a new copy, but it shouldn’t impact the unittests or the grader.

1 Like

freqs dictionary still off - is that OK or?

type(freqs) = <class 'dict'> len(freqs) = 11406

Expected output

type(freqs) = <class 'dict'>
len(freqs) = 11436

@lkj did the the subsequent unittest fail again? can you run down till the unittest and confirm once?

1 Like

no, it passed - thanks!

image

It was just this markdown part that was misleading:

hi @lkj

according to recent update, the Freq is differing because of the stopword of the process tweet which is currently addressed with changes done in the metadata of the assignment notebook.

This has been updated temporarily as changing the process data, then would require autograder changes too which might take some more time.

So the l.t. @lucas.coutinho resolved the current issue as autograder changes as well as process tweet changes in the assignment might take a little longer time to make changes.

so please go ahead for now as you should not have problem with submission and unittest cell. if you encounter any such issue, please let us know. Thank you again for reporting this and being patient while the staff was addressing the issue.

Regards
DP

2 Likes

thanks very much @Deepti_Prasad @paulinpaloalto @lucas.coutinho for your help :slightly_smiling_face:

1 Like