C3W2 Exercise 6 Test fail

I get the following error in E6 whenever I test it:

Your function could not be tested due an error. Please make sure you are passing the correct tensor to the model call. You need to expand its dimension before calling the model.
 Test failed.

Although I’m still using tf.expand_dims but unsure how to implement the number of dimensions to expand.

Hi @Lujain_Andijani

You have many examples of how to use tf.expand_dims in section “4.3 A note on padding”.

Maybe the problem is not in dimensions. Did you use sentence_vectorizer to convert the sentence to ids prior the expansion of the dims?

Actually, yes I made so by using sentence_vectorizer(sentence), correct me if I’m wrong.

You’re not wrong, that’s correct way for this Assignment of changing “sentence” to “ids”.

Ok then, do I pass the same sentence_vectorizer to the expand_dims or something else?

No, you pass the sentence_vectorized (the sentence converted to ids) to tf.expand_dims to create a batch dimension.
When you have the batch (of one sentence), you can ask the model to predict the output.

Cheers

Seems like I’m reaching a dead end, I realized that the issue isn’t in the expansion itself, but the parameters passed to the model, does the following seem to be alright or not?
output = model(len(tag_map), sentence_vectorizer.vocabulary_size())
or
output = model(len(tag_map), tf.shape(sentence_vectorized)

That is not how you get an output from the TensorFlow model. Please see the same section “4.3 A note on padding” for clues.

Thank you so much this worked!
But I apologize if I took too much of time, I faced another error while computing argmax:

Your function could not be tested due an error. The exception is:
	{{function_node __wrapped__StridedSlice_device_/job:localhost/replica:0/task:0/device:GPU:0}} Index out of range using input dim 2; input has only 2 dims [Op:StridedSlice] name: strided_slice/
 Test failed.

Using the following line:
outputs = tf.argmax(output, axis=1)

No worries @Lujain_Andijani, I’m here to help.

The model in this Assignment outputs the predictions of shape:
(batch size, sequence length, number of tags).
In Exercise 6’s case the batch size is 1 (that’s why we expanded the dimensions), the sequence length is sentence length, and the number of tags is 17, so you have:
(1, sentence length, 17)
shaped tensor as an output.

So, to answer your question, if you have not modified the output of the model in any way, then the argmax should be taken not on the axis=1, but on the axis=2 (or axis=-1, where -1 stands for “last”).
In other words, you want to get the most probable NER tag for each sentence word (so the shape of the output of the argmax would be (batch size, sequence length) or (1, sentence length), but not the most probable word for each tag in the sentence as in your current (batch size, number of tags) or (1, 17).

axis=1 would be a valid choice if you have removed the batch dimension prior to taking argmax hence my clause “if you have not modified the output of the model”. But this would probably be not in line with Course tests (but I don’t know, since I haven’t explored it).

Cheers

1 Like

Seems nothing works here, I’m getting the same error mentioned recently and don’t even know where it’s originated from, what can I do further for more help? Can I share my lab with a mentor?

I noticed that other learners had some problems with Exercise 6.

Here is the what the entire function should accomplish (you can print your variables and check if they match when running last cell of the Assignment):

# input
print(sentence)
'Peter Parker , the White House director of trade and manufacturing policy of U.S , said in an interview on Sunday morning that the White House was working to prepare for the possibility of a second wave of the coronavirus in the fall , though he said it wouldn ’t necessarily come'

  • first step is to convert the sentence to ids:
# Convert the sentence into ids
# sentence_vectorized  = ?
print(sentence_vectorized)
<tf.Tensor: shape=(52,), dtype=int64, numpy=
array([ 2428, 24948,     4,     2,   450,   322,  1288,     6,   379,
           9,  2137,   678,     6,  3514,     4,    19,     5,    28,
         763,    15,    89,  1445,    16,     2,   450,   322,    20,
         591,     7,  2742,    12,     2,  2512,     6,     8,   257,
        2005,     6,     2,     1,     5,     2,  1482,     4,  1966,
          39,    19,    36,     1,     1, 19636,   629])>

  • second step is to add fake batch dimension (to get shape (1, 52)):
# Expand its dimension to make it appropriate to pass to the model
# sentence_vectorized = ?
print(sentence_vectorized)
<tf.Tensor: shape=(1, 52), dtype=int64, numpy=
array([[ 2428, 24948,     4,     2,   450,   322,  1288,     6,   379,
            9,  2137,   678,     6,  3514,     4,    19,     5,    28,
          763,    15,    89,  1445,    16,     2,   450,   322,    20,
          591,     7,  2742,    12,     2,  2512,     6,     8,   257,
         2005,     6,     2,     1,     5,     2,  1482,     4,  1966,
           39,    19,    36,     1,     1, 19636,   629]])>

  • next step is to get the model output:
# Get the model output
# output = ?
# the model should output 17 NER tag log probabilities (for every word)
print(output.shape)
TensorShape([1, 52, 17])

# print first three words' prediction outputs
print(output[0, :3, :])
tf.Tensor(
[[-9.08685303e+00 -1.01892262e+01 -6.59615135e+00 -6.39378500e+00
  -1.04870424e+01 -6.25451708e+00 -1.19573958e-02 -7.03884411e+00
  -1.00574131e+01 -1.02226448e+01 -1.09528828e+01 -1.06831293e+01
  -1.16824932e+01 -9.32558250e+00 -5.93654966e+00 -8.28116989e+00
  -5.89271259e+00]
 [-1.03360939e+01 -9.50216103e+00 -7.81255722e+00 -8.22268772e+00
  -1.07739229e+01 -6.91944027e+00 -4.93170786e+00 -1.12327328e+01
  -8.11356544e+00 -8.02027607e+00 -7.17875242e+00 -7.40014744e+00
  -1.04091778e+01 -5.82789326e+00 -1.63617153e-02 -8.13122654e+00
  -6.24460363e+00]
 [-1.41942320e+01 -1.31869612e+01 -1.08198023e+01 -1.51442690e+01
  -1.34834719e+01 -1.01789818e+01 -9.31655693e+00 -1.06511927e+01
  -1.00971813e+01 -1.10133991e+01 -1.10683813e+01 -1.53163004e+01
  -1.31404486e+01 -8.04349327e+00 -7.01542616e+00 -1.05354805e+01
  -1.49805343e-03]], shape=(3, 17), dtype=float32)

  • next step is to check which prediction is highest for each word:
# Get the predicted labels for each token, using argmax function and specifying the correct axis to perform the argmax
# outputs = ?
# you can check the shape
print(outputs.shape)
(1, 52)

# you can check which tag has the highest probability (for each word)
print(outputs)
array([[ 6, 14, 16, 16,  5, 13, 16, 16, 16, 16, 16, 16, 16,  5, 16, 16,
        16, 16, 16, 16,  7, 15, 16, 16,  5, 13, 16, 16, 16, 16, 16, 16,
        16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
        16, 16, 16, 16]])

  • next step is provided for you (removes the fake batch dimension):
# Next line is just to adjust outputs dimension. Since this function expects only one input to get a prediction, outputs will be something like [[1,2,3]]
# so to avoid heavy notation below, let's transform it into [1,2,3]
# you can check the shape (note: not (1, 52) but (52,)
print(outputs.shape)
(52,)

# check the outputs
print(outputs)
array([ 6, 14, 16, 16,  5, 13, 16, 16, 16, 16, 16, 16, 16,  5, 16, 16, 16,
       16, 16, 16,  7, 15, 16, 16,  5, 13, 16, 16, 16, 16, 16, 16, 16, 16,
       16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
       16])

  • next step is to get the text version of NER labels (also provided for you):
# Get a list of all keys, remember that the tag_map was built in a way that each label id matches its index in a list

print(labels)
['B-art', 'B-eve', 'B-geo', 'B-gpe', 'B-nat', 'B-org', 'B-per', 'B-tim', 'I-art', 'I-eve', 'I-geo', 'I-gpe', 'I-nat', 'I-org', 'I-per', 'I-tim', 'O']


  • lastly, you have to implement the loop over the outputs to get the NER text labels (instead of indices):
# Iterating over every predicted token in outputs list
    for tag_idx in ?  ## loop over all the outputs
        pred_label = ?  ## get the label at index tag_idx
        pred.append(?)  ## append it to our predictions list

This should populate the pred list (one NER tag at a time for every word in the outputs):

['B-per']  # text label at index 6
['B-per', 'I-per'] # text labels at index 6 and 14
['B-per', 'I-per', 'O'] # text labels at index 6, 14 and 16
...

# all the 52 text label predictions
['B-per',
 'I-per',
 'O',
....
 'O',
 'O',
 'O']

This should help you understand what is wanted from you. You can also check your implementation’s intermediate values (if and where they deviate).

Cheers

2 Likes

my model did not work with tf.expand_dims(sentence_vectorized, axis=0) it was giving error with LSTM layer. So I used sentence_vectorized directly without tf.expand_dims, it worked just fine passed the test.
lastly make sure you use pred_label = labels[tag_idx] in loop. Goodluck