Parse csv incorrect result- First sentence has 737 words (after removing stopwords)

Hello!

My parse_data_from_file function does not return the expected 436 result.

My result looks like:

There are 2225 sentences in the dataset.

First sentence has 737 words (after removing stopwords).

There are 2225 labels in the dataset.

The first 5 labels are [‘tech’, ‘business’, ‘sport’, ‘sport’, ‘entertainment’]

I have separately tested remove_stopwords function and it did give 436 with the first sentence as seen in output[2]; I have also tried parsing the second sentence, and it returned 1431 after removing stopwords.

Thank you so much! :smiley:

Please remove stopwords before adding an entry to sentences list.

3 Likes

Oh Geeeez! Yes totally! Thank you so much!