Tokenize_labels function

Ed_Sykes · September 13, 2022, 7:34pm

hi

would you mind giving me a hint on the “tokenize_labels” function please?

Here is my output:

First 5 labels of the training set should look like this:
[[3]
[1]
[0]
[0]
[4]]

First 5 labels of the validation set should look like this:
[[3]
[1]
[0]
[0]
[4]]

Tokenized labels of the training set have shape: (2225, 1)

Tokenized labels of the validation set have shape: (2225, 1)

when the Expected Output is:
Expected Output:

First 5 labels of the training set should look like this:
[[3]
 [1]
 [0]
 [0]
 [4]]

First 5 labels of the validation set should look like this:
[[4]
 [3]
 [2]
 [0]
 [0]]

Tokenized labels of the training set have shape: (1780, 1)

Tokenized labels of the validation set have shape: (445, 1)

balaji.ambresh · September 14, 2022, 1:25pm

1780 + 445 = 2225. Did you forget to split the labels into training and validation sets?

Ed_Sykes · September 14, 2022, 3:12pm

hi

Everything seems fine up to this function.

There are 1780 sentences for training.

There are 1780 labels for training.

There are 445 sentences for validation.

There are 445 labels for validation.

Padded training sequences have shape: (1780, 120)

Padded validation sequences have shape: (445, 120)

So, in “def tokenize_labels(all_labels, split_labels):” I have:

“# Instantiated the Tokenizer (no additional arguments needed)”
“# Fit the tokenizer on all the labels”
label_tokenizer.fit_on_texts(all_labels)

" # Convert labels to sequences"
label_seq = label_tokenizer.texts_to_sequences(split_labels)

" # Convert sequences to a numpy array. "
label_seq_np = np.array(label_seq) - 1

I am getting this error now:
“VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify ‘dtype=object’ when creating the ndarray.
label_seq_np = np.array(label_seq) - 1”

balaji.ambresh · September 14, 2022, 3:46pm

Please click my name and message your notebook as an attachment.
Don’t forget to remove code from your posts on this thread.

balaji.ambresh · September 14, 2022, 3:55pm

Please fix the bugs in train_val_split.
Labels, as the names say, should be used for creating training and validation labels. Similarly, sentences should be used create training / validation sentences.

Ed_Sykes · September 14, 2022, 6:21pm

Thank you! really appreciate your help

Ed_Sykes · September 14, 2022, 7:01pm

Hi @balaji.ambresh ,

Sorry, just one last question.

Here’s my model

" model = tf.keras.Sequential([
tf.keras.layers.Embedding(num_words, embedding_dim, input_length=maxlen),
tf.keras.layers.GlobalAveragePooling1D(),
tf.keras.layers.Dense(24, activation=‘relu’),
tf.keras.layers.Dense(5, activation=‘softmax’)
])

model.compile(loss=tf.keras.losses.CategoricalCrossentropy(), # try:  MAE,  MSE
              optimizer='adam',
              metrics=['accuracy']) "

When I run:
"model = create_model(NUM_WORDS, EMBEDDING_DIM, MAXLEN)

history = model.fit(train_padded_seq, train_label_seq, epochs=30, validation_data=(val_padded_seq, val_label_seq))"

I’m getting:

"---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/Volumes/GoogleDrive-107870226340842906740/My Drive/My_Research/Intro_Tensorflow/tensorflow-1-public/C3/W2/assignment/C3W2_Assignment.ipynb Cell 30 in <cell line: 3>()
1 model = create_model(NUM_WORDS, EMBEDDING_DIM, MAXLEN)
----> 3 history = model.fit(train_padded_seq, train_label_seq, epochs=30, validation_data=(val_padded_seq, val_label_seq))

File ~/opt/anaconda3/envs/tf2_python_3_8_13/lib/python3.10/site-packages/keras/utils/traceback_utils.py:70, in filter_traceback..error_handler(*args, **kwargs)
67 filtered_tb = _process_traceback_frames(e.traceback)
68 # To get the full stack trace, call:
69 # tf.debugging.disable_traceback_filtering()
—> 70 raise e.with_traceback(filtered_tb) from None
71 finally:
72 del filtered_tb

File /var/folders/61/2rsxbqtx2zqbwlb914ssc_zm0000gn/T/autograph_generated_fileayf1gjrd.py:15, in outer_factory..inner_factory..tf__train_function(iterator)
13 try:
14 do_return = True
—> 15 retval = ag_.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope)
16 except:
17 do_return = False

ValueError: in user code:

File "/Users/edsykes/opt/anaconda3/envs/tf2_python_3_8_13/lib/python3.10/site-packages/keras/engine/training.py", line 1160, in train_function  *
    return step_function(self, iterator)
File "/Users/edsykes/opt/anaconda3/envs/tf2_python_3_8_13/lib/python3.10/site-packages/keras/engine/training.py", line 1146, in step_function  **
    outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/Users/edsykes/opt/anaconda3/envs/tf2_python_3_8_13/lib/python3.10/site-packages/keras/engine/training.py", line 1135, in run_step  **
    outputs = model.train_step(data)
File "/Users/edsykes/opt/anaconda3/envs/tf2_python_3_8_13/lib/python3.10/site-packages/keras/engine/training.py", line 994, in train_step
    loss = self.compute_loss(x, y, y_pred, sample_weight)
File "/Users/edsykes/opt/anaconda3/envs/tf2_python_3_8_13/lib/python3.10/site-packages/keras/engine/training.py", line 1052, in compute_loss
    return self.compiled_loss(
File "/Users/edsykes/opt/anaconda3/envs/tf2_python_3_8_13/lib/python3.10/site-packages/keras/engine/compile_utils.py", line 265, in __call__
    loss_value = loss_obj(y_t, y_p, sample_weight=sw)
File "/Users/edsykes/opt/anaconda3/envs/tf2_python_3_8_13/lib/python3.10/site-packages/keras/losses.py", line 152, in __call__
    losses = call_fn(y_true, y_pred)
File "/Users/edsykes/opt/anaconda3/envs/tf2_python_3_8_13/lib/python3.10/site-packages/keras/losses.py", line 272, in call  **
    return ag_fn(y_true, y_pred, **self._fn_kwargs)
File "/Users/edsykes/opt/anaconda3/envs/tf2_python_3_8_13/lib/python3.10/site-packages/keras/losses.py", line 1990, in categorical_crossentropy
    return backend.categorical_crossentropy(
File "/Users/edsykes/opt/anaconda3/envs/tf2_python_3_8_13/lib/python3.10/site-packages/keras/backend.py", line 5529, in categorical_crossentropy
    target.shape.assert_is_compatible_with(output.shape)

ValueError: Shapes (None, 1) and (None, 5) are incompatible"

It appears that the shapes are not compatible, but I don’t know which ones it is not happy with.

thank you!!
Ed

balaji.ambresh · September 14, 2022, 7:30pm

The labels are encoded as integers in [0, 4] i.e. not one-hot encoded. Please fix the loss function. Don’t forget to remove code from your post.

Topic		Replies	Views
TF1,C3,WK 2 Assignent re tokenize_labels Natural Language Processing in TensorFlow week-module-2 , week-module-3 , week-module-4	6	566	January 8, 2023
C3W2 tokenize_labels returns correct shape but wrong values Natural Language Processing in TensorFlow week-module-2 , week-module-3 , week-module-4	6	652	October 11, 2022
C3W2 Assignment - Validation Set Labels Natural Language Processing in TensorFlow week-module-2 , week-module-3 , week-module-4	11	641	October 16, 2024
Tokenize_labels() function in assignment? Natural Language Processing in TensorFlow week-module-2 , week-module-3 , week-module-4	7	818	October 23, 2023
How to tokenize data for NER NLP with Sequence Models week-module-3	1	389	September 23, 2023

Tokenize_labels function

Related topics