C1_W4_Assignment- Test the model- preprocessing error

Hi,

When we apply the function- tokenize(used while training data) instead of just nltk.word_tokenize, the predictions are quiet different. In practice we should be using the same processing on test data, that has been applied to train data.

Thanks and Regards
Aroonima

Hi,

It is mentioned in the notebook - ’ To split a sentence into tokens you can use word_tokenize method. It will separate words, punctuation, and apply some stemming.’
But actually nltk’s word_tokenize does not take care of punctuations or stemming. They have to be dealt with separately.

Thanks and Regards
Aroonima