C3W1 incorrect word count from fit_tokenizer() function

James_Watkins · July 20, 2023, 5:42pm

Both my remove_stopwords() and parse_data_from_file() functions return the expected output.

However, when I run my fit_tokenizer() function, I get 30,888 words instead of 29,741 words. My changes to the function are quite simple (only two lines) and does not throw an error otherwise.

Has anyone else seen this result?

Best Regards -
Jim

bruno_ramos_martins · July 20, 2023, 7:05pm

Hi @James_Watkins,

As far as I understand, we have not used the previous steps in this part yet. We need to transform the sentences into a sequence of tokens, following the instructions of the statement. I believe there is an example in the labs, similar to this, that does not count for a grade. The steps are: initialize the class and then train it.

James_Watkins · July 20, 2023, 7:19pm

Indeed, I made use of the code examples from the ungraded assignment, including the initialization of the tokenizer with the oov_token argument.

Is that what you are suggesting?

bruno_ramos_martins · July 20, 2023, 7:45pm

I had suggested this before. I thought that you could have used some of the functions that were defined earlier. If possible, click on my name and send your notebook so that I can check, okay?

David_Winslow · December 8, 2023, 6:10pm

I have the exact same issue currently. The earlier code changes are fine but for some reason I am seeing the 30,888 and because of this it is throwing off the indexing of the words as well. Any suggestions would be appreciated.

Dave

James_Watkins · December 8, 2023, 6:59pm

I recall that the issue for me was that I was incorrectly passing a list of strings, rather than a string, to the remove_stopwords(sentence) function.

I would suggest you look at where you are creating the string sentence and make sure you are actually creating a single string.

David_Winslow · December 9, 2023, 2:48pm

Thanks James, that resolved my issue as well.

Dave

Topic		Replies	Views
C3-W1 Count Errors Natural Language Processing in TensorFlow week-1	3	683	July 28, 2022
C3 W1 assignment: Vocabulary contains 29608 words instead of 29714 Natural Language Processing in TensorFlow week-1	4	648	June 27, 2022
C3W1-Assignment -> too much words in vocab and wrong shape Natural Language Processing in TensorFlow	12	439	January 25, 2024
TF Dev specialization, Course-3,wk-1, fit_tokenizer(): Natural Language Processing in TensorFlow week-1	3	503	February 21, 2023
There was a problem compiling the code from your notebook. Details: name 'sentences' is not defined Natural Language Processing in TensorFlow week-1	9	752	January 5, 2023

C3W1 incorrect word count from fit_tokenizer() function

Related topics