Tokenize_labels function

hi

would you mind giving me a hint on the “tokenize_labels” function please?

Here is my output:

First 5 labels of the training set should look like this:
[[3]
[1]
[0]
[0]
[4]]

First 5 labels of the validation set should look like this:
[[3]
[1]
[0]
[0]
[4]]

Tokenized labels of the training set have shape: (2225, 1)

Tokenized labels of the validation set have shape: (2225, 1)

when the Expected Output is:
Expected Output:

First 5 labels of the training set should look like this:
[[3]
 [1]
 [0]
 [0]
 [4]]

First 5 labels of the validation set should look like this:
[[4]
 [3]
 [2]
 [0]
 [0]]

Tokenized labels of the training set have shape: (1780, 1)

Tokenized labels of the validation set have shape: (445, 1)

1780 + 445 = 2225. Did you forget to split the labels into training and validation sets?

hi

Everything seems fine up to this function.

There are 1780 sentences for training.

There are 1780 labels for training.

There are 445 sentences for validation.

There are 445 labels for validation.

Padded training sequences have shape: (1780, 120)

Padded validation sequences have shape: (445, 120)

So, in “def tokenize_labels(all_labels, split_labels):” I have:

“# Instantiated the Tokenizer (no additional arguments needed)”
“# Fit the tokenizer on all the labels”
label_tokenizer.fit_on_texts(all_labels)

" # Convert labels to sequences"
label_seq = label_tokenizer.texts_to_sequences(split_labels)

" # Convert sequences to a numpy array. "
label_seq_np = np.array(label_seq) - 1

I am getting this error now:
“VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify ‘dtype=object’ when creating the ndarray.
label_seq_np = np.array(label_seq) - 1”

Please click my name and message your notebook as an attachment.
Don’t forget to remove code from your posts on this thread.

Please fix the bugs in train_val_split.
Labels, as the names say, should be used for creating training and validation labels. Similarly, sentences should be used create training / validation sentences.

1 Like

Thank you! really appreciate your help

Hi @balaji.ambresh ,

Sorry, just one last question.

Here’s my model

" model = tf.keras.Sequential([
tf.keras.layers.Embedding(num_words, embedding_dim, input_length=maxlen),
tf.keras.layers.GlobalAveragePooling1D(),
tf.keras.layers.Dense(24, activation=‘relu’),
tf.keras.layers.Dense(5, activation=‘softmax’)
])

model.compile(loss=tf.keras.losses.CategoricalCrossentropy(), # try:  MAE,  MSE
              optimizer='adam',
              metrics=['accuracy']) "

When I run:
"model = create_model(NUM_WORDS, EMBEDDING_DIM, MAXLEN)

history = model.fit(train_padded_seq, train_label_seq, epochs=30, validation_data=(val_padded_seq, val_label_seq))"

I’m getting:

"---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/Volumes/GoogleDrive-107870226340842906740/My Drive/My_Research/Intro_Tensorflow/tensorflow-1-public/C3/W2/assignment/C3W2_Assignment.ipynb Cell 30 in <cell line: 3>()
1 model = create_model(NUM_WORDS, EMBEDDING_DIM, MAXLEN)
----> 3 history = model.fit(train_padded_seq, train_label_seq, epochs=30, validation_data=(val_padded_seq, val_label_seq))

File ~/opt/anaconda3/envs/tf2_python_3_8_13/lib/python3.10/site-packages/keras/utils/traceback_utils.py:70, in filter_traceback..error_handler(*args, **kwargs)
67 filtered_tb = _process_traceback_frames(e.traceback)
68 # To get the full stack trace, call:
69 # tf.debugging.disable_traceback_filtering()
—> 70 raise e.with_traceback(filtered_tb) from None
71 finally:
72 del filtered_tb

File /var/folders/61/2rsxbqtx2zqbwlb914ssc_zm0000gn/T/autograph_generated_fileayf1gjrd.py:15, in outer_factory..inner_factory..tf__train_function(iterator)
13 try:
14 do_return = True
—> 15 retval
= ag
_.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope)
16 except:
17 do_return = False

ValueError: in user code:

File "/Users/edsykes/opt/anaconda3/envs/tf2_python_3_8_13/lib/python3.10/site-packages/keras/engine/training.py", line 1160, in train_function  *
    return step_function(self, iterator)
File "/Users/edsykes/opt/anaconda3/envs/tf2_python_3_8_13/lib/python3.10/site-packages/keras/engine/training.py", line 1146, in step_function  **
    outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/Users/edsykes/opt/anaconda3/envs/tf2_python_3_8_13/lib/python3.10/site-packages/keras/engine/training.py", line 1135, in run_step  **
    outputs = model.train_step(data)
File "/Users/edsykes/opt/anaconda3/envs/tf2_python_3_8_13/lib/python3.10/site-packages/keras/engine/training.py", line 994, in train_step
    loss = self.compute_loss(x, y, y_pred, sample_weight)
File "/Users/edsykes/opt/anaconda3/envs/tf2_python_3_8_13/lib/python3.10/site-packages/keras/engine/training.py", line 1052, in compute_loss
    return self.compiled_loss(
File "/Users/edsykes/opt/anaconda3/envs/tf2_python_3_8_13/lib/python3.10/site-packages/keras/engine/compile_utils.py", line 265, in __call__
    loss_value = loss_obj(y_t, y_p, sample_weight=sw)
File "/Users/edsykes/opt/anaconda3/envs/tf2_python_3_8_13/lib/python3.10/site-packages/keras/losses.py", line 152, in __call__
    losses = call_fn(y_true, y_pred)
File "/Users/edsykes/opt/anaconda3/envs/tf2_python_3_8_13/lib/python3.10/site-packages/keras/losses.py", line 272, in call  **
    return ag_fn(y_true, y_pred, **self._fn_kwargs)
File "/Users/edsykes/opt/anaconda3/envs/tf2_python_3_8_13/lib/python3.10/site-packages/keras/losses.py", line 1990, in categorical_crossentropy
    return backend.categorical_crossentropy(
File "/Users/edsykes/opt/anaconda3/envs/tf2_python_3_8_13/lib/python3.10/site-packages/keras/backend.py", line 5529, in categorical_crossentropy
    target.shape.assert_is_compatible_with(output.shape)

ValueError: Shapes (None, 1) and (None, 5) are incompatible"

It appears that the shapes are not compatible, but I don’t know which ones it is not happy with.

thank you!!
Ed

The labels are encoded as integers in [0, 4] i.e. not one-hot encoded. Please fix the loss function. Don’t forget to remove code from your post.