Issue in running C2_W4_Assignment on a personal PC

Shantimohan_Elchuri · February 21, 2022, 6:50pm

When I run this assignment on a personal PC, though cell #23 has errors, it doesn’t stop and the notebook runs till the end. But in cell #23, the w4_unittest.test_gradient_descent passes only 2 tests and 14 tests failed.

When I trace back, I find that at the output of cell #3, the Number of tokens is 60976 on my pc whereas it was 60996. This was resulting in vocabulary size of 5775 on my PC whereas it was 5778 on Coursera server. I was getting the same results on my Mac also.

Mean while I downgraded nltk from ver 3.6.5 to 3.4.5 to troubleshoot similar error that I was getting for C1_W2_Assignment as suggested in this forum post. Now for the C2_W4_Assignment I got Number of tokens as 60975 and vocabulary size as 5777.

When I upgraded nltk ver from 3.4.5 to the version 3.5 (as in the Coursera servers for this assignment) the unit test passed as I got the same numbers as in Coursera Server.

So first thing I request is for any of you to confirm this by running the C2_W4_Assignment on your personal machines.

Second is to suggest what exactly do we have to do to run this assignment with nltk ver 3.6.5 which is the latest one. Definitely different nltk versions are not backward compatible. But there must be a way to process the things correctly. With 60K+ tokens it will be laborious process to compare the generated tokens between the two versions.

ThanQ… in advance…

balaji.ambresh · February 23, 2022, 6:50am

You’re right about the difference in vocab sizes.
Since coursera allows the labs to be downloaded, it might be good to list all the dependencies for the lab via a requirements.txt file so that people can start creating and using their own virtual environments. @Community-Team to look into this.

All tests are based on a particular environment setup. So, you can still run your code on nltk 3.6.5 but the tests are going to fail till the tests are updated.

Shantimohan_Elchuri · February 24, 2022, 4:13pm

If you have nltk version 3.6.5 then to run this assignment to pass all tests in Cell #23 add the following lines to cell #3 after reading data from the shakespeare.txt file:

data = data.replace(“stol’n”, “stol ’ n”)
data = data.replace(“Ev’n”, “Ev ’ n”)
data = data.replace(“fall’n”, “fall ’ n”)

As compared to nltk 3.5 (on the Coursera server) this results in Number of tokens = 60992 which is 4 less than that got on the Coursera server. But final results will be same as the Vocabulary size will be same (= 5778).

Between nltk versions 3.5 and 3.6.5, these 3 words are treated differently. In nltk 3.5 these words are each broken into 3 distinct tokens. Where as in nltk 3.6.5 these 3 words each result in single token. I feel what nltk 3.6.5 doing is correct. However the isalpha() function used next eliminates them altogether.

So I feel the function w4_unittest.test_gradient_descent(…) should be updated for the expected counts.

ThanQ…

Jeongho_Jang · September 13, 2023, 11:36pm

Thank you for letting us know nltk == 3.5.* works for the unit test.

Topic		Replies	Views
Errors for C2_W4_Assignment w4_unittest.test_gradient_descent NLP with Probabilistic Models week-4	3	383	December 18, 2023
NLP Spec. C4W1 unit test 'w1_unittest' seems faulty when used locally NLP with Attention Models week-1	2	141	June 10, 2024
NLP C4 Week 2 UNQ_9 and UNQ_10 NLP with Attention Models week-1	5	718	July 16, 2022
C3_W2_Assignment Exercise 04 Check stderr for more details NLP with Sequence Models week-2	3	657	August 27, 2022
C2W4 - Grading issue for ex5 only NLP with Probabilistic Models week-4	6	55	December 20, 2024

Issue in running C2_W4_Assignment on a personal PC

Related topics