Hi guys!
I have a problem with the 00V token detection. When i fit the tokenizer and i sequence the senteces, i get the correct number of words in the vocabulary but also i get “ token NOT included in vocabulary”.
I initialized the tokenizer as i did during the class in the same way, but still dont get the 00V included in the vocabulary.
Any tips?
My lab: ldzjcyddmhcu
Please click my name and message your notebook as an attachment.
Please find the feedback below:
- The out of vocabulary token to use is
<OOV>
and not <00V>
(not zeros but upper case alphabet i.e. O)
- See the function
fit_tokenizer
to understand how word to index mapping is referenced from inside the tokenizer instance in the test code. This will help fix the mistakes in tokenize_labels
1 Like