I want to do something like this code of NER task which will align WordPieces of word with tag of this word:
import tensorflow as tf
tokens = tf.ragged.constant([[4], [2, 5, 9]], dtype=tf.int32)
tags = tf.ragged.constant([3, 5], dtype=tf.int32)
flat_tokens = tf.reshape(tokens, [-1])
duplicated_tags = tf.repeat(tags, [tf.shape(tok)[0] for tok in tokens])
print(flat_tokens.numpy()) # -> [4 2 5 9]
print(duplicated_tags.numpy()) # -> [3 5 5 5]
But with input of tokens
and tags
to tf.repeat
as datasets which should be outputs of TextLineDataset
. Are any minimalistic ways to do it?
1 Like
Please move this question to General Discussion
category since it’s not course related.
It is directly related NLP in Tensorflow course. This topic should be included in it when you provide its update due to a problems, that we discussed.
It is directly connected with tokenization topic. Most part of modern nlp tokenizers are WordPiece.
What I meant was that concepts like ragged tensors and wordpiece tokenization aren’t covered in the course material.
and I offer to include them
At least wordpiece tokenization. As I remember, ragged tensors are discussed on your another Tensorflow course.
The staff have been notified about your offer.
1 Like
Hi Mihail. Thank you for the suggestion. Unfortunately, we can’t give you an answer anytime soon if this can be included in the course itself. But we’ll take note of it. Thanks again!