C3W1 incorrect word count from fit_tokenizer() function

Both my remove_stopwords() and parse_data_from_file() functions return the expected output.

However, when I run my fit_tokenizer() function, I get 30,888 words instead of 29,741 words. My changes to the function are quite simple (only two lines) and does not throw an error otherwise.

Has anyone else seen this result?

Best Regards -
Jim

1 Like

Hi @James_Watkins,

As far as I understand, we have not used the previous steps in this part yet. We need to transform the sentences into a sequence of tokens, following the instructions of the statement. I believe there is an example in the labs, similar to this, that does not count for a grade. The steps are: initialize the class and then train it.

Indeed, I made use of the code examples from the ungraded assignment, including the initialization of the tokenizer with the oov_token argument.

Is that what you are suggesting?

I had suggested this before. I thought that you could have used some of the functions that were defined earlier. If possible, click on my name and send your notebook so that I can check, okay?

I have the exact same issue currently. The earlier code changes are fine but for some reason I am seeing the 30,888 and because of this it is throwing off the indexing of the words as well. Any suggestions would be appreciated.

Dave

I recall that the issue for me was that I was incorrectly passing a list of strings, rather than a string, to the remove_stopwords(sentence) function.

I would suggest you look at where you are creating the string sentence and make sure you are actually creating a single string.

Thanks James, that resolved my issue as well.

Dave