C3_W4_Lab_1.ipynb | C3 - Natural Language Processing in Tensorflow

Hi

In the cell of “C3_W4_Lab_1.ipynb” which originally is:

Get sample sentence

sentence = corpus[0].split()
print(f’sample sentence: {sentence}')

Initialize token list

token_list =

Look up the indices of each word and append to the list

for word in sentence:
token_list.append(tokenizer.word_index[word])

Print the token list

print(token_list)


I’m trying to test the “sentence = corpus[1].split()”, that is, changing from index 0 to index 1.

But when I did, an error occurred:
"
KeyError Traceback (most recent last call)

into <cell line: 9>()
8 # Search the indexes of each word and add to the list
9 for word in sentence:
—> 10 token_list.append(tokenizer.word_index[word])
11
12 # Print the list of tokens

KeyError: ‘pound.’
"
Why it does not work?

Best regards!

This is the output of corpus[1].split():
sample sentence: ['battered', 'away', 'til', 'he', 'hadnt', 'a', 'pound.']

The word pound does exist in tokenizer.word_list:

>>> tokenizer.word_index['pound']
73

The word pound. doesn’t exist in the word index since Tokenizer removes punctuations (see filters) when creating the word list.
Python string split doesn’t remove punctuations. This is why you observe a key error.