C3_W4_Lab_1.ipynb | C3 - Natural Language Processing in Tensorflow

Raphael_Garcia_Morei · August 17, 2023, 7:13pm

Hi

In the cell of “C3_W4_Lab_1.ipynb” which originally is:

Get sample sentence

sentence = corpus[0].split()
print(f’sample sentence: {sentence}')

Initialize token list

token_list =

Look up the indices of each word and append to the list

for word in sentence:
token_list.append(tokenizer.word_index[word])

Print the token list

print(token_list)

I’m trying to test the “sentence = corpus[1].split()”, that is, changing from index 0 to index 1.

But when I did, an error occurred:
"
KeyError Traceback (most recent last call)

into <cell line: 9>()
8 # Search the indexes of each word and add to the list
9 for word in sentence:
—> 10 token_list.append(tokenizer.word_index[word])
11
12 # Print the list of tokens

KeyError: ‘pound.’
"
Why it does not work?

Best regards!

balaji.ambresh · October 4, 2023, 4:59pm

This is the output of corpus[1].split():
sample sentence: ['battered', 'away', 'til', 'he', 'hadnt', 'a', 'pound.']

The word pound does exist in tokenizer.word_list:

>>> tokenizer.word_index['pound']
73

The word pound. doesn’t exist in the word index since Tokenizer removes punctuations (see filters) when creating the word list.
Python string split doesn’t remove punctuations. This is why you observe a key error.

Topic		Replies	Views
C3 W1 assignment: Vocabulary contains 29608 words instead of 29714 Natural Language Processing in TensorFlow week-module-1	4	664	June 27, 2022
I think there's an error in C3_W4_Lab_1.ipynb? Natural Language Processing in TensorFlow week-module-4	1	180	August 31, 2023
I am getting error in fit_tokenizer function? Natural Language Processing in TensorFlow	5	308	January 30, 2023
KeyError: 'i'. index of word 'i' should be 2 Natural Language Processing in TensorFlow	6	327	July 11, 2022
Too many words in Vocabulary for Tokenizer: TF Course 3 W1 Assignment Natural Language Processing in TensorFlow week-module-1	5	576	September 16, 2022

C3_W4_Lab_1.ipynb | C3 - Natural Language Processing in Tensorflow

Get sample sentence

Initialize token list

Look up the indices of each word and append to the list

Print the token list

Related topics