I think I found an error in the notebook. If you notice in this section of the notebook it says that the missing word lanigan
should correspond to the index 70, however if I create a list of keys in the word_index and then get their index, it is not correct. The index is off by one, and I have to use an index of 69
to get lanigan
Dear Nick,
Thanks for the great question,
It is a great remark but I would like to point out that we added a 0 at the beginning of the tokenizer word index as a padding token
Please check this code snippet:
# Initialize the Tokenizer class
tokenizer = Tokenizer()
# Generate the word index dictionary
tokenizer.fit_on_texts(corpus)
# Define the total words. You add 1 for the index `0` which is just the padding token.
total_words = len(tokenizer.word_index) + 1
print(f'word index dictionary: {tokenizer.word_index}')
print(f'total words: {total_words}')
Thus,
you are comparing tow arrays with different lengths.
Please feel free to discuss this further and all the best with your journey,
Thanks,
Thamer