[Week 4] Transformer Network Application: Named-Entity Recognition

H​as anyone played around with the model already?

Using the predict method on one example gives me a tuple with length one, containing a numpy array with shape (512, 1, 12) (I am assuming this is max-length, m examples, # classes). It contains numbers ranging from about -10 to 10. I am not sure how to interpret the results. If the highest number would indicate the class by it’s position, the predictions do not match the labels of that example well.

I am wondering, if I am meant to add a Dense layer with softmax activation on the end of the model. Having a look at the summary, there is already a Dense “Classifier” layer with “multiple” as output shape.

S​o I am not sure, if the model is just not sufficiently trained with the 220 examples and the accurancy of 0.75 is not the best metric to capture it’s performance. Or if I am just missing a proper way to interpret the results from the predict-method. Or another step in between.

I​ had a look at the documentation of the model on this page: DistilBERT — transformers 4.5.0.dev0 documentation

I​ could not find the predict method and an example how to deal with it. The output = model(input) did not work with the input format, I was using (test[‘input_ids’][0]) and I am not sure if this function is meant to give out predictions anyway.

I​ would be happy to know if and how anyone managed to get more out of the model than me.

Hi Vanessa,

The model prediction takes a batch of data as input, e.g., test[‘input_ids’][:1] is a batch input with just only one (the 1st) data.
According to the document, output is an TFTokenClassifierOutput object with a data member “logits” has shape (batch_size, sequence_length, config.num_labels), i.e., (1, 512, 12) in the case. It also mentioned the logits are Classification scores (before SoftMax), thus, you have to apply softmax on the logits to get probabilities, and then apply argmax to get label index. Finally, convert label index to tags by id2tag[label index].
BTW, you may wanna train more epochs to get better performance.

1 Like

Hi Edward,
using :
prediction = model.predict(test[‘input_ids’][:1])
I get the following error:

File “”, line 1
model.predict(test[‘input_ids’][:1])
^
SyntaxError: invalid character in identifier

Could you help me find what is wrong ?
Thanks for your help
Marc

prediction = model.predict(test[‘input_ids’][:1]) # error
prediction = model.predict(test['input_ids'][:1]) # correct

Use single quote. Don’t copy, unless copy from code segment, b/c the code page of text context is not ANSI.

Thank you very much. I could have been stuck there for a while.

Scarica Outlook per Android

Like Vanessa, I am trying to figure out how this prediction is interpreted. Following Edward’s response I retrained the model with 50 epochs and tried:

prediction = model.predict(test['input_ids'][:1])

for row in prediction.logits[0]:
    soft = tf.nn.softmax(row)
    max = tf.math.argmax(tf.nn.softmax(row))
    print(id2tag[max.numpy()])

This returns 512 items, as follows:

word 1 : Empty
word 2 : Name

word 7 : Name
word 8 : Designation
word 9 : Designation
word 10 : Designation
word 11 : Empty

word 129 : Empty
word 130 : Designation
word 131 : Designation
word 132 : Designation
word 133 : Empty
word 134 : College Name

word 141 : College Name
word 142 : Empty

word 165 : Empty
word 166 : Skills

word 199 : Skills
word 200 : Empty

word 204 : Empty
word 205 : Skills

word 261 : Skills
word 262 : Empty
word 263 : Skills
word 264 : Empty

word 506 : Empty

I am assuming then, that the model predicted that the words 2 through 7 are names, the words 8 through 10 designation, it could not figure out what was in words 11 through 129, etc?

Is this correct?

1 Like

Yes, you’re right. Compare with label data, you’re pretty good
btw, both softmax and argmax functions support matrix input, you can calculate them without for loop.

Hi edwardyu,

Thanks for your comment. But, after following your suggestions we get tag predictions = sequence_length i.e., 512. But our original sentence length is not necessarily = sequence_length. Then how can we identify entities associated to words from the original text?

Prediction is same as training, all inputs in the same batch must have to be padded or truncated to the same seq_len, but different batches can have different seq_len. In the exercise, just for convenience, we padded/truncated all data to 512.
If you predict only one input, you can slice the input into exact seq_len. For instance,

seq_len = 100
prediction = model.predict(tf.convert_to_tensor(test['input_ids'])[:1, :seq_len])

Thanks, edwardyu for your reply. I did not think about that I can use different seq_len for different batches. But the question is what if original sentence after bert tokenization exceeds 512? Even for training sentences for that matter, when sentence length exceeds that limit we truncate them to 512 (or given max length). How should we handle these cases in reality?

-my primary thought on this is to split original sentence into multiple smaller sentences. This method will have it’s downsides but that’s the only work-around that I can come up. Your help is again greatly appreciated. Thanks!

If you train your model from scratch, you can assign longer sequence length by setting DistilBertConfig with larger value of max_position_embeddings.
If your pre-trained model is limited to 512, and your sentence length is more than 512. I guess your idea may work sometimes, if you split sentence carefully. But I didn’t try.

Thanks! I will check that.