I have problems with Exercise 3 where we need to complete the function fit_label_encoder. The error that I get is: Slicing dataset elements is not supported for rank 0.
Maybe my line code where I defined labels is incorrect. I define labels using
the function tf.data.Dataset.from_tensor_slices() directly with the pair (train_labels,validation_labels) as input.
My understanding is that I should concatenate train_labels and validation_labels and the way to do this is via the tf.data.Dataset.from_tensor_slices() function. The error I receive is that “Slicing dataset elements is not supported for rank 0”. It is not clear to me what exactly I am doing wrong.
Label encoder is supposed to adapt to labels across both training and validation sets. If training labels = ['label 1', 'label 2'] and validation labels as ['label 1', 'label 3'], we should get the encoder to learn from ['label 1', 'label 2', 'label 1', 'label 3']. Dataset.from_tensor_slices doesn’t do this. See the difference:
>>> import tensorflow as tf
>>> train_labels = ['label 1', 'label 2']
>>> val_labels = ['label 1', 'label 3']
# this creates a 2D array
>>> print(np.asarray(list(tf.data.Dataset.from_tensor_slices([train_labels, val_labels]).as_numpy_iterator())))
[[b'label 1' b'label 2']
[b'label 1' b'label 3']]
# this creates a 1D array which is what we want
>>> train_dataset = tf.data.Dataset.from_tensor_slices(train_labels)
>>> val_dataset = tf.data.Dataset.from_tensor_slices(val_labels)
>>> print(list(train_dataset.concatenate(val_dataset).as_numpy_iterator()))
[b'label 1', b'label 2', b'label 1', b'label 3']
As far as the rank error is concerned, seems like you’re passing a scalar somewhere. tf.rank represents the number of dimensions of a tensor. Here are a few examples:
I appreciate your responses, thanks a lot for these clarifications, but unfortunately I still did not complete the exercise. Moreover, I still have exactly the same error.
To summarize my work:
I use the decode_label function on the input train_labels and validation_labels
Following your suggestion, I apply tf.data.Dataset.from_tensor_slices() to train_labels and validation_labels and get train_dataset and val_dataset, respectively.
I define labels = train_dataset.concatenate(val_dataset)
I define label_encoder = tf.keras.layers.StringLookup(num_oov_indices=0) —-> I put num_oov_indices=0 in order to remove the OOV tokens.
I adapt label_encoder by using label_encoder.adapt(labels)
Could you point out to me which step is wrong? In particular, how can I pass a scalar somewhere since I still get the error that slicing dataset elements is not supported for rank 0?
Thanks a lot for your help, it works. Now I have the following confusion: why does the fit_label_encoder() function contain inside of it the following decode_labels() function if this is not actually non needed?
def decode_labels(label)
# Decode byte string to a UTF-8 string
label = tf.strings.unicode_decode(label, "UTF-8")
return label
# Apply the decode function to both train_labels and validation_labels
train_labels = train_labels.map(decode_labels)
validation_labels = validation_labels.map(decode_labels)
Is there an alternative solution where the decode_labels() function is actually used?
I thought that the decode_labels() function was in the code from the beginning. How can I check this? Is there a way that I can have a completely new lab with no modifications that I can work on it?