How to use tf.repeat and another buil-in highlevel funcs on Dataset?

someone555777 · October 22, 2023, 12:10pm

I want to do something like this code of NER task which will align WordPieces of word with tag of this word:

import tensorflow as tf

tokens = tf.ragged.constant([[4], [2, 5, 9]], dtype=tf.int32)
tags = tf.ragged.constant([3, 5], dtype=tf.int32)

flat_tokens = tf.reshape(tokens, [-1])
duplicated_tags = tf.repeat(tags, [tf.shape(tok)[0] for tok in tokens])

print(flat_tokens.numpy())  # -> [4 2 5 9]
print(duplicated_tags.numpy())  # -> [3 5 5 5]

But with input of tokens and tags to tf.repeat as datasets which should be outputs of TextLineDataset. Are any minimalistic ways to do it?

balaji.ambresh · October 23, 2023, 5:26am

Please move this question to General Discussion category since it’s not course related.

someone555777 · October 23, 2023, 10:57am

It is directly related NLP in Tensorflow course. This topic should be included in it when you provide its update due to a problems, that we discussed.
It is directly connected with tokenization topic. Most part of modern nlp tokenizers are WordPiece.

balaji.ambresh · October 23, 2023, 2:55pm

What I meant was that concepts like ragged tensors and wordpiece tokenization aren’t covered in the course material.

someone555777 · October 23, 2023, 2:57pm

and I offer to include them At least wordpiece tokenization. As I remember, ragged tensors are discussed on your another Tensorflow course.

balaji.ambresh · October 23, 2023, 3:26pm

The staff have been notified about your offer.

chris.favila · November 28, 2023, 6:26am

Hi Mihail. Thank you for the suggestion. Unfortunately, we can’t give you an answer anytime soon if this can be included in the course itself. But we’ll take note of it. Thanks again!

Topic		Replies	Views
Recommended way to tokenize new code Natural Language Processing in TensorFlow week-1	1	1598	December 13, 2023
How to tokenize data for NER NLP with Sequence Models week-3	1	383	September 23, 2023
C3 W2 Lab3 - Subword Tokenization Natural Language Processing in TensorFlow week-3	3	12	February 26, 2025
C4W1_Assignment - Exercise 5 NLP with Sequence Models week-1	41	1366	May 28, 2024
Course 5 Week 4: Named-Entity Recognition notebook does not seem to use attention mask Sequence Models	4	550	April 21, 2022

How to use tf.repeat and another buil-in highlevel funcs on Dataset?

Related topics