As I am unable to create simple function.
Exercise 3 - Implement the question answering function.
Anyone here help me regard my last exercise of course NLP with attention models.
notebook of NLP with attention models week3
Hi @m_hassan
Here are the steps for the answer_question
for the âExpected Output:â cell example. Where âquestionâ is:
question: When was the Chechen-Ingush Autonomous Soviet Socialist Republic transferred from the Georgian SSR? context: On January 9, 1957, Karachay Autonomous Oblast and Chechen-Ingush Autonomous Soviet Socialist Republic were restored by Khrushchev and they were transferred from the Georgian SSR back to the Russian SFSR.
And âanswerâ is:
answer: January 9, 1957
Steps you have to implement
(and the intermediate values that you can check against):
QUESTION SETUP
Step 1:
# Tokenize the question
print(tokenized_question)
<tf.Tensor: shape=(79,), dtype=int32, numpy=
array([ 822, 10, 366, 47, 8, 2556, 1559, 18, 1570,
122, 8489, 2040, 3114, 1162, 12873, 2730, 343, 5750,
10250, 45, 8, 5664, 29, 180, 6857, 58, 2625,
10, 461, 1762, 9902, 24011, 6, 17422, 3441, 63,
2040, 3114, 1162, 411, 21234, 11, 2556, 1559, 18,
1570, 122, 8489, 2040, 3114, 1162, 12873, 2730, 343,
5750, 130, 13216, 57, 13495, 17363, 13847, 11, 79,
130, 10250, 45, 8, 5664, 29, 180, 6857, 223,
12, 8, 4263, 3, 7016, 6857, 5], dtype=int32)>
Step 2:
# Add an extra dimension to the tensor
print(tokenized_question)
tf.Tensor(
[[ 822 10 366 47 8 2556 1559 18 1570 122 8489 2040
3114 1162 12873 2730 343 5750 10250 45 8 5664 29 180
6857 58 2625 10 461 1762 9902 24011 6 17422 3441 63
2040 3114 1162 411 21234 11 2556 1559 18 1570 122 8489
2040 3114 1162 12873 2730 343 5750 130 13216 57 13495 17363
13847 11 79 130 10250 45 8 5664 29 180 6857 223
12 8 4263 3 7016 6857 5]], shape=(1, 79), dtype=int32)
Step 3:
# Pad the question tensor
print(padded_question)
[[ 822 10 366 47 8 2556 1559 18 1570 122 8489 2040
3114 1162 12873 2730 343 5750 10250 45 8 5664 29 180
6857 58 2625 10 461 1762 9902 24011 6 17422 3441 63
2040 3114 1162 411 21234 11 2556 1559 18 1570 122 8489
2040 3114 1162 12873 2730 343 5750 130 13216 57 13495 17363
13847 11 79 130 10250 45 8 5664 29 180 6857 223
12 8 4263 3 7016 6857 5 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0]]
ANSWER SETUP
Step 4:
# Tokenize the answer
# Hint: All answers begin with the string "answer: "
print(tokenized_answer)
tf.Tensor([1525 10], shape=(2,), dtype=int32)
Step 5:
# Add an extra dimension to the tensor
print(tokenized_answer)
tf.Tensor([[1525 10]], shape=(1, 2), dtype=int32)
Step 6:
# Get the id of the EOS token
print(eos)
tf.Tensor(1, shape=(), dtype=int32)
Step 7:
# Loop for decoder_maxlen iterations
___First loop iteration (i=0):
Step 7.1:
# Predict the next word using the model, the input document and the current state of output
print(next_word)
tf.Tensor([[1762]], shape=(1, 1), dtype=int32)
Step 7.2:
# Concat the predicted next word to the output
print(tokenized_answer)
tf.Tensor([[1525 10 1762]], shape=(1, 3), dtype=int32)
Step 7.3:
# The text generation stops if the model predicts the EOS token
tf.Tensor([[False]], shape=(1, 1), dtype=bool)
___Second loop iteration (i=1):
Step 7.1:
# Predict the next word using the model, the input document and the current state of output
print(next_word)
tf.Tensor([[9902]], shape=(1, 1), dtype=int32)
Step 7.2:
# Concat the predicted next word to the output
print(tokenized_answer)
tf.Tensor([[1525 10 1762 9902]], shape=(1, 4), dtype=int32)
Step 7.3:
# The text generation stops if the model predicts the EOS token
tf.Tensor([[False]], shape=(1, 1), dtype=bool)
___ ⌠other remaining loop iterations
(4 total, the 4th step predicts âEOSâ token and the loop stops with the final answer:)
print(tokenized_answer)
tf.Tensor([[ 1525 10 1762 9902 24011 1]], shape=(1, 6), dtype=int32)
Which stands for:
âanswer: January 9, 1957â
Cheers
First question is if you are using the updated version of the NLP Specialization, because the updated version has different line 14:
# Tokenize the question
and not:
# Tokenize question and context
Make sure youâre using the latest Course materials first.
Second, to address your error - the SentencePiece tokenizer does not have the encode
method.
Look at the previous usage of the tokinizer in the notebook. You will see that we make use of tokenize
or detokenize
or other methods, but never encode
.
Cheers
I am doing on updated version of NLP specialization. @arvyzukai
I would advise to refresh the notebook (read carefully and save your prior work) because the code comments do not match the latest Assignment (have you changed the code comments?)