C3W2 Assignment - fit_label_encoder

Hello,

I have problems with Exercise 3 where we need to complete the function fit_label_encoder. The error that I get is: Slicing dataset elements is not supported for rank 0.

Maybe my line code where I defined labels is incorrect. I define labels using
the function tf.data.Dataset.from_tensor_slices() directly with the pair (train_labels,validation_labels) as input.

Can someone help me? Thank you in advance.

MapDataset supports concatenate operation. Does this hint help?

My understanding is that I should concatenate train_labels and validation_labels and the way to do this is via the tf.data.Dataset.from_tensor_slices() function. The error I receive is that “Slicing dataset elements is not supported for rank 0”. It is not clear to me what exactly I am doing wrong.

Label encoder is supposed to adapt to labels across both training and validation sets. If training labels = ['label 1', 'label 2'] and validation labels as ['label 1', 'label 3'], we should get the encoder to learn from ['label 1', 'label 2', 'label 1', 'label 3']. Dataset.from_tensor_slices doesn’t do this. See the difference:

>>> import tensorflow as tf
>>> train_labels = ['label 1', 'label 2']
>>> val_labels = ['label 1', 'label 3']
# this creates a 2D array
>>> print(np.asarray(list(tf.data.Dataset.from_tensor_slices([train_labels, val_labels]).as_numpy_iterator())))
[[b'label 1' b'label 2']
 [b'label 1' b'label 3']]
 
# this creates a 1D array which is what we want
>>> train_dataset = tf.data.Dataset.from_tensor_slices(train_labels)
>>> val_dataset = tf.data.Dataset.from_tensor_slices(val_labels)
>>> print(list(train_dataset.concatenate(val_dataset).as_numpy_iterator()))
[b'label 1', b'label 2', b'label 1', b'label 3']

As far as the rank error is concerned, seems like you’re passing a scalar somewhere. tf.rank represents the number of dimensions of a tensor. Here are a few examples:

# scalar
>>> tf.rank(1).numpy()
0
# vector
>>> tf.rank([1]).numpy()
1
# 2D matrix
>>> tf.rank([[1]]).numpy()
2

I appreciate your responses, thanks a lot for these clarifications, but unfortunately I still did not complete the exercise. Moreover, I still have exactly the same error.

To summarize my work:

  1. I use the decode_label function on the input train_labels and validation_labels
  2. Following your suggestion, I apply tf.data.Dataset.from_tensor_slices() to train_labels and validation_labels and get train_dataset and val_dataset, respectively.
  3. I define labels = train_dataset.concatenate(val_dataset)
  4. I define label_encoder = tf.keras.layers.StringLookup(num_oov_indices=0) —-> I put num_oov_indices=0 in order to remove the OOV tokens.
  5. I adapt label_encoder by using label_encoder.adapt(labels)

Could you point out to me which step is wrong? In particular, how can I pass a scalar somewhere since I still get the error that slicing dataset elements is not supported for rank 0?

The steps below aren’t required since train_labels and validation_labels are already Datasets:

The reason I used Datasets.from_tensor_slices was to convert a python list to TensorSliceDataset.

Thanks a lot for your help, it works. Now I have the following confusion: why does the fit_label_encoder() function contain inside of it the following decode_labels() function if this is not actually non needed?

def decode_labels(label)
# Decode byte string to a UTF-8 string
label = tf.strings.unicode_decode(label, "UTF-8")
return label

# Apply the decode function to both train_labels and validation_labels
train_labels = train_labels.map(decode_labels)
validation_labels = validation_labels.map(decode_labels)

Is there an alternative solution where the decode_labels() function is actually used?

I don’t see a function decode_labels in C3W2 assignment starter code. Am I missing something?

I thought that the decode_labels() function was in the code from the beginning. How can I check this? Is there a way that I can have a completely new lab with no modifications that I can work on it?

  • Open your current notebook.
  • Go to “File->Rename”, and rename the notebook.
  • Go to the Lab Help menu (the question-mark inside a circle), and use “Get latest version”.
  • Go to “File->Open” menu, and open the new notebook.
  • Use the “Kernel->Restart & Clear All Output” command.

Now the new notebook is ready for you to use.

Note: Do not rename the new notebook. The grader always uses the notebook with the original file name.

1 Like