Train data vocabulary mismatch

I have 14823 words in my vocabulary compared to 14821 which is the solution. Can’t really find the error.
I am talking about the cells after ex7 in the end of part 1. All other results (inc unit tests etc) are correct.
The function for the vocabulary is correct not sure where the change is caused

Hi @gkouro

Are you running the notebook on Coursera or locally? If locally, check nltk.__version__ is ‘3.5’ because preprocessing (in particular emojis) is different in later nltk versions.

I guess you are right. I have been running it locally and the version i installed is 3.8.1