C3W1_Assignment fit_tokenizer() 00V problem

Alberto_Zacchini · December 14, 2023, 8:24am

Hi guys!
I have a problem with the 00V token detection. When i fit the tokenizer and i sequence the senteces, i get the correct number of words in the vocabulary but also i get “ token NOT included in vocabulary”.

I initialized the tokenizer as i did during the class in the same way, but still dont get the 00V included in the vocabulary.

Any tips?

My lab: ldzjcyddmhcu

balaji.ambresh · December 14, 2023, 8:38am

Please click my name and message your notebook as an attachment.

balaji.ambresh · December 14, 2023, 4:15pm

Please find the feedback below:

The out of vocabulary token to use is <OOV> and not <00V> (not zeros but upper case alphabet i.e. O)
See the function fit_tokenizer to understand how word to index mapping is referenced from inside the tokenizer instance in the test code. This will help fix the mistakes in tokenize_labels

Topic		Replies	Views
Unexpected output for fit_tokenizer function Natural Language Processing in TensorFlow week-module-2 , week-module-3 , week-module-4	2	528	January 12, 2023
Data mismatch with expected output int test for fit_toeknizer Natural Language Processing in TensorFlow week-module-2 , week-module-3 , week-module-4	3	531	July 17, 2022
Vocabulary of labels- extra oov term Natural Language Processing in TensorFlow week-module-1	2	533	May 22, 2022
C3W1: fit_token Natural Language Processing in TensorFlow week-module-1	4	554	November 16, 2022
C3W1-Assignment -> too much words in vocab and wrong shape Natural Language Processing in TensorFlow	12	451	January 25, 2024

C3W1_Assignment fit_tokenizer() 00V problem

Related topics