I think there's an error in C3_W4_Lab_1.ipynb?

Nick_VN · August 23, 2023, 8:32pm

I think I found an error in the notebook. If you notice in this section of the notebook it says that the missing word lanigan should correspond to the index 70, however if I create a list of keys in the word_index and then get their index, it is not correct. The index is off by one, and I have to use an index of 69 to get lanigan

alotaibit · August 31, 2023, 4:29pm

Dear Nick,

Thanks for the great question,

It is a great remark but I would like to point out that we added a 0 at the beginning of the tokenizer word index as a padding token

Please check this code snippet:

# Initialize the Tokenizer class
tokenizer = Tokenizer()

# Generate the word index dictionary
tokenizer.fit_on_texts(corpus)

# Define the total words. You add 1 for the index `0` which is just the padding token.
total_words = len(tokenizer.word_index) + 1

print(f'word index dictionary: {tokenizer.word_index}')
print(f'total words: {total_words}')

Thus,
you are comparing tow arrays with different lengths.

Please feel free to discuss this further and all the best with your journey,

Thanks,
Thamer

Topic		Replies	Views
C3 W1 assignment: Vocabulary contains 29608 words instead of 29714 Natural Language Processing in TensorFlow week-module-1	4	648	June 27, 2022
C3_W4_Lab_1.ipynb \| C3 - Natural Language Processing in Tensorflow Natural Language Processing in TensorFlow week-module-4	1	196	October 4, 2023
Too many words in Vocabulary for Tokenizer: TF Course 3 W1 Assignment Natural Language Processing in TensorFlow week-module-1	5	574	September 16, 2022
C3W1 incorrect word count from fit_tokenizer() function Natural Language Processing in TensorFlow week-module-1	6	475	December 9, 2023
C3W1: fit_token Natural Language Processing in TensorFlow week-module-1	4	549	November 16, 2022

I think there's an error in C3_W4_Lab_1.ipynb?

Related topics