The code is working in the Coursera notebook. However, I like to follow along in my own Jupyter Notebook to practice importing datasets, etc.
When I run through the subword tokenization section I get the following error:
----> 5 sequences = sequences.ragged_batch(batch_size=sequences.cardinality())
AttributeError: ‘MapDataset’ object has no attribute ‘ragged_batch’
I played around with ChatGPT to find a workaround and it says the following:
How to Handle Ragged Data Correctly
Since you seem to be working with variable-length data and are perhaps looking to batch them efficiently, here’s how you can handle ragged data without using a non-existent ragged_batch
method:
Option 1: Using padded_batch
to Pad Sequences
This is the most straightforward method if your model does not natively support ragged tensors:
python
Copy
# Suppose `train_data` is your dataset
train_sequences = train_data.map(lambda text: vectorize_layer(text))
# Pad the sequences to a fixed length
train_padded = train_sequences.padded_batch(batch_size=32, padded_shapes=(MAX_LENGTH,), padding_values=0)
Option 2: Convert to Ragged Tensor After Batching
If you prefer to keep the data in its ragged form and handle it as such within your model:
python
Copy
# Batch the data first without padding
train_batched = train_sequences.batch(32)
# Convert each batch to a ragged tensor
train_ragged = train_batched.map(lambda x: tf.RaggedTensor.from_tensor(x, padding=0))
The problem with this method is it then gives the following error:
ValueError: The padded shape (120,) is not compatible with the shape () of the corresponding input component.
For my own understanding can you provide some guidance on what’s happening on my own machine. Am I using the wrong version or missing a download?
Thanks
Josh