C3W1_Assignment - punctuation in remove_stopwords()

Crimeanlion · July 16, 2023, 10:18am

While doing the “Week 1” assignment, I was a bit surprised because my results didn’t match the expected output after parsing the “./data/bbc-text.csv” file: I got less number of words after calling the “remove_stopwords” function.

After doing a few tests I discovered that this assignment doesn’t expect the student to take punctuation into account while removing stopwords from the sentences. So when the stop list contains the word “them”, and we have two sentences:

… for them in terms …
… they play on them.
according to the assignment’s expectation - “them” should be removed only from the first one.

What is the reason for this? If I process test data from this assignment and remove stopwords with punctuation in mind - the final vocabulary gets smaller for 42 words:
== My result: Vocabulary contains 29672 words
== Expected: Vocabulary contains 29714 words

Here are some examples of words from test data, that can be removed: “them.”, “it.”, “for!”, “(you”, “up)”, “-which”, “[when”, “than…”, “it:”, etc.

saifkhanengr · July 16, 2023, 10:26am

You posted this in the General Discussion category. Please move it to the relevant course category as described here and our mentors will be happy to assist you.

Crimeanlion · July 16, 2023, 10:41am

Thank you, for some reason, I didn’t find a proper category at the beginning.

Topic		Replies	Views
Issues with remove_stopwords() in weekly assignment Natural Language Processing in TensorFlow week-module-1	4	624	January 19, 2023
Parse csv incorrect result- First sentence has 737 words (after removing stopwords) Natural Language Processing in TensorFlow week-module-1	2	637	May 22, 2022
C3W1: removing stopwords Natural Language Processing in TensorFlow week-module-1	1	569	November 11, 2022
C3W1 assignment I am getting 737 words Natural Language Processing in TensorFlow week-module-1	2	398	September 29, 2023
Output for remove_stopwords is different Natural Language Processing in TensorFlow week-module-1	7	664	September 10, 2022

C3W1_Assignment - punctuation in remove_stopwords()

Related topics