I noticed that other learners had some problems with Exercise 6.
Here is the what the entire function should accomplish (you can print your variables and check if they match when running last cell of the Assignment):
# input
print(sentence)
'Peter Parker , the White House director of trade and manufacturing policy of U.S , said in an interview on Sunday morning that the White House was working to prepare for the possibility of a second wave of the coronavirus in the fall , though he said it wouldn ’t necessarily come'
- first step is to convert the sentence to ids:
# Convert the sentence into ids
# sentence_vectorized = ?
print(sentence_vectorized)
<tf.Tensor: shape=(52,), dtype=int64, numpy=
array([ 2428, 24948, 4, 2, 450, 322, 1288, 6, 379,
9, 2137, 678, 6, 3514, 4, 19, 5, 28,
763, 15, 89, 1445, 16, 2, 450, 322, 20,
591, 7, 2742, 12, 2, 2512, 6, 8, 257,
2005, 6, 2, 1, 5, 2, 1482, 4, 1966,
39, 19, 36, 1, 1, 19636, 629])>
- second step is to add fake batch dimension (to get shape (1, 52)):
# Expand its dimension to make it appropriate to pass to the model
# sentence_vectorized = ?
print(sentence_vectorized)
<tf.Tensor: shape=(1, 52), dtype=int64, numpy=
array([[ 2428, 24948, 4, 2, 450, 322, 1288, 6, 379,
9, 2137, 678, 6, 3514, 4, 19, 5, 28,
763, 15, 89, 1445, 16, 2, 450, 322, 20,
591, 7, 2742, 12, 2, 2512, 6, 8, 257,
2005, 6, 2, 1, 5, 2, 1482, 4, 1966,
39, 19, 36, 1, 1, 19636, 629]])>
- next step is to get the model output:
# Get the model output
# output = ?
# the model should output 17 NER tag log probabilities (for every word)
print(output.shape)
TensorShape([1, 52, 17])
# print first three words' prediction outputs
print(output[0, :3, :])
tf.Tensor(
[[-9.08685303e+00 -1.01892262e+01 -6.59615135e+00 -6.39378500e+00
-1.04870424e+01 -6.25451708e+00 -1.19573958e-02 -7.03884411e+00
-1.00574131e+01 -1.02226448e+01 -1.09528828e+01 -1.06831293e+01
-1.16824932e+01 -9.32558250e+00 -5.93654966e+00 -8.28116989e+00
-5.89271259e+00]
[-1.03360939e+01 -9.50216103e+00 -7.81255722e+00 -8.22268772e+00
-1.07739229e+01 -6.91944027e+00 -4.93170786e+00 -1.12327328e+01
-8.11356544e+00 -8.02027607e+00 -7.17875242e+00 -7.40014744e+00
-1.04091778e+01 -5.82789326e+00 -1.63617153e-02 -8.13122654e+00
-6.24460363e+00]
[-1.41942320e+01 -1.31869612e+01 -1.08198023e+01 -1.51442690e+01
-1.34834719e+01 -1.01789818e+01 -9.31655693e+00 -1.06511927e+01
-1.00971813e+01 -1.10133991e+01 -1.10683813e+01 -1.53163004e+01
-1.31404486e+01 -8.04349327e+00 -7.01542616e+00 -1.05354805e+01
-1.49805343e-03]], shape=(3, 17), dtype=float32)
- next step is to check which prediction is highest for each word:
# Get the predicted labels for each token, using argmax function and specifying the correct axis to perform the argmax
# outputs = ?
# you can check the shape
print(outputs.shape)
(1, 52)
# you can check which tag has the highest probability (for each word)
print(outputs)
array([[ 6, 14, 16, 16, 5, 13, 16, 16, 16, 16, 16, 16, 16, 5, 16, 16,
16, 16, 16, 16, 7, 15, 16, 16, 5, 13, 16, 16, 16, 16, 16, 16,
16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
16, 16, 16, 16]])
- next step is provided for you (removes the fake batch dimension):
# Next line is just to adjust outputs dimension. Since this function expects only one input to get a prediction, outputs will be something like [[1,2,3]]
# so to avoid heavy notation below, let's transform it into [1,2,3]
# you can check the shape (note: not (1, 52) but (52,)
print(outputs.shape)
(52,)
# check the outputs
print(outputs)
array([ 6, 14, 16, 16, 5, 13, 16, 16, 16, 16, 16, 16, 16, 5, 16, 16, 16,
16, 16, 16, 7, 15, 16, 16, 5, 13, 16, 16, 16, 16, 16, 16, 16, 16,
16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
16])
- next step is to get the text version of NER labels (also provided for you):
# Get a list of all keys, remember that the tag_map was built in a way that each label id matches its index in a list
print(labels)
['B-art', 'B-eve', 'B-geo', 'B-gpe', 'B-nat', 'B-org', 'B-per', 'B-tim', 'I-art', 'I-eve', 'I-geo', 'I-gpe', 'I-nat', 'I-org', 'I-per', 'I-tim', 'O']
- lastly, you have to implement the loop over the outputs to get the NER text labels (instead of indices):
# Iterating over every predicted token in outputs list
for tag_idx in ? ## loop over all the outputs
pred_label = ? ## get the label at index tag_idx
pred.append(?) ## append it to our predictions list
This should populate the pred list (one NER tag at a time for every word in the outputs):
['B-per'] # text label at index 6
['B-per', 'I-per'] # text labels at index 6 and 14
['B-per', 'I-per', 'O'] # text labels at index 6, 14 and 16
...
# all the 52 text label predictions
['B-per',
'I-per',
'O',
....
'O',
'O',
'O']
This should help you understand what is wanted from you. You can also check your implementation’s intermediate values (if and where they deviate).
Cheers