C3W1_Assignment - punctuation in remove_stopwords()

While doing the “Week 1” assignment, I was a bit surprised because my results didn’t match the expected output after parsing the “./data/bbc-text.csv” file: I got less number of words after calling the “remove_stopwords” function.

After doing a few tests I discovered that this assignment doesn’t expect the student to take punctuation into account while removing stopwords from the sentences. So when the stop list contains the word “them”, and we have two sentences:

  1. … for them in terms …
  2. … they play on them.
    according to the assignment’s expectation - “them” should be removed only from the first one.

What is the reason for this? If I process test data from this assignment and remove stopwords with punctuation in mind - the final vocabulary gets smaller for 42 words:
== My result: Vocabulary contains 29672 words
== Expected: Vocabulary contains 29714 words

Here are some examples of words from test data, that can be removed: “them.”, “it.”, “for!”, “(you”, “up)”, “-which”, “[when”, “than…”, “it:”, etc.

You posted this in the General Discussion category. Please move it to the relevant course category as described here and our mentors will be happy to assist you.

1 Like

Thank you, for some reason, I didn’t find a proper category at the beginning.