C1W4 - Tokenizer


I was wondering why we use the tokenizer to split the review body up just to put it back together with a few errors (do n’t)? Is this something we expect the text created by our nlt algorithms to look like so we need a robust way to fix it?

All the best

Ya the use of tokenization is to understand how text is processed by the computer