How was achieved the different shapes fitting to model?

someone555777 · September 15, 2023, 6:38pm

I am trying to rewrite the code from C3_W3_Assignment to tensroflow and I would like to know how you achieved, that generated data with different padding length from batch to batch was fitted to model?

I have something like this

model = keras.Sequential([
    tf.keras.layers.Embedding(32767, 50),
    tf.keras.layers.LSTM(50),
    tf.keras.layers.Dense(10),
    tf.keras.layers.Softmax()
])

model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy'])

model.fit(tf.data.Dataset.zip(test_sentences_vec, test_tags_vec).padded_batch(10), epochs=3)

Outputs

Node: 'sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits'
logits and labels must have the same first dimension, got logits shape [10,10] and labels shape [20]
	 [[{{node sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits}}]] [Op:__inference_train_function_896421]

And this is what in data

>>> list(tf.data.Dataset.zip(test_sentences_vec, test_tags_vec).padded_batch(10).as_numpy_iterator())

[(array([[   1,    0],
         [2151,    0],
         [   1,    0],
         [   1,    0],
         [   1,    0],
         [   1,    0],
         [2652,    1],
         [   1,    0],
         [   1,    0],
         [   1,    0]]),
  array([[ 1128,     0],
         [    1,     0],
         [   47,     0],
         [   36,     0],
         [   52,     0],
         [   52,     0],
         [   32,    32],
         [   45,     0],
         [   45,     0],
         [10206,     0]])),

(array([[    1,     0,     0,     0,     0,     0,     0,     0],
         [    1,     0,     0,     0,     0,     0,     0,     0],
         [    1, 16688,  3431,  1032,  2201,     1,  1115,     1],
         [11009,     1,     0,     0,     0,     0,     0,     0],
         [    1,  1059,     0,     0,     0,     0,     0,     0],
         [    1,     0,     0,     0,     0,     0,     0,     0],
         [    1,     0,     0,     0,     0,     0,     0,     0],
         [18680,     0,     0,     0,     0,     0,     0,     0],
         [ 2084,     0,     0,     0,     0,     0,     0,     0],
         [    1,     0,     0,     0,     0,     0,     0,     0]]),
  array([[   52,     0,     0,     0,     0,     0,     0,     0],
         [   52,     0,     0,     0,     0,     0,     0,     0],
         [   32,    32,    32,    32,    32,    32,    32,    32],
         [   45,    45,     0,     0,     0,     0,     0,     0],
         [   45,    45,     0,     0,     0,     0,     0,     0],
         [10206,     0,     0,     0,     0,     0,     0,     0],
         [ 1128,     0,     0,     0,     0,     0,     0,     0],
         [ 6235,     0,     0,     0,     0,     0,     0,     0],
         [    1,     0,     0,     0,     0,     0,     0,     0],
         [   47,     0,     0,     0,     0,     0,     0,     0]])),

 (array([[    1,     0,     0,     0,     0,     0],
         [20047,     0,     0,     0,     0,     0],
         [ 2820,     0,     0,     0,     0,     0],
         [    1,     0,     0,     0,     0,     0],
         [    1,     0,     0,     0,     0,     0],
         [    1,     0,     0,     0,     0,     0],
         [    1,     0,     0,     0,     0,     0],
         [ 2331,  4945,  1882,     1,  1536,     1],
         [13770, 13773, 24500,     0,     0,     0],
         [    1, 17789,  1012,     1,     0,     0]]),
  array([[1128,    0,    0,    0,    0,    0],
         [6235,    0,    0,    0,    0,    0],
         [   1,    0,    0,    0,    0,    0],
         [  47,    0,    0,    0,    0,    0],
         [  36,    0,    0,    0,    0,    0],
         [  52,    0,    0,    0,    0,    0],
         [  52,    0,    0,    0,    0,    0],
         [  32,   32,   32,   32,   32,   32],
         [  45,   45,   45,    0,    0,    0],
         [  45,   45,   45,   45,    0,    0]])),

The main problem is that tf.keras.layers.Dense(10) likes to take just fixed length of element (10 — that is number of all existing tags) in batch. If I do .padded_batch(10, padded_shapes=([10], [len(10])) everything works.

But I checked in your lab and next(train_generator)[0].shape really outputs different shapes from batch to batch. So, how was achieved, that your model is working fine if there are different shapes from batch to batch?

canxkoz · September 15, 2023, 7:37pm

Hello @someone555777 ,
To handle data with different padding lengths from batch to batch, you can use the Masking layer in your model. The Masking layer is used to ignore the padded values during the training process, allowing the model to work with variable-length input sequences.

Here’s how you can modify your model to include the Masking layer:

import tensorflow as tf
from tensorflow import keras

model = keras.Sequential([
    tf.keras.layers.Embedding(32767, 50),
    tf.keras.layers.Masking(mask_value=0),  # Add the Masking layer
    tf.keras.layers.LSTM(50),
    tf.keras.layers.Dense(10),
    tf.keras.layers.Softmax()
])

model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

model.fit(tf.data.Dataset.zip(test_sentences_vec, test_tags_vec).padded_batch(10), epochs=3)

In this example, the Masking layer is added after the Embedding layer. The mask_value parameter is set to 0, which means that the layer will ignore any input values equal to 0 during training. This is useful when you have padded your input sequences with zeros to make them all the same length.

By adding the Masking layer, your model can now handle input data with different padding lengths from batch to batch, as it will ignore the padded values during training.

I hope I was able to give an answer to your question. Please ask a followup question if you have further questions.
Regards,
Can Koz

someone555777 · September 15, 2023, 8:15pm

Wow! Cool. I wouldn’t have guessed for a long time. Will test, thanks.

But I don’t see that we used the masking before training in initial lab. Is it something like Trax specific?

someone555777 · September 16, 2023, 12:05pm

so, I changed

model = keras.Sequential([
    tf.keras.layers.Embedding(32767, 50),
    tf.keras.layers.Masking(mask_value=0),
    tf.keras.layers.LSTM(50),
    tf.keras.layers.Dense(len(contact_type)),
    tf.keras.layers.Softmax()
])

model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy'])

model.fit(tf.data.Dataset.zip(test_sentences_vec, test_tags_vec).padded_batch(10), epochs=3)

And still get the error

2023-09-16 15:01:23.670102: W tensorflow/core/common_runtime/type_inference.cc:339] Type inference failed. This indicates an invalid graph that escaped type checking. Error message: INVALID_ARGUMENT: expected compatible input types, but input 1:
type_id: TFT_OPTIONAL
args {
  type_id: TFT_PRODUCT
  args {
    type_id: TFT_TENSOR
    args {
      type_id: TFT_INT32
    }
  }
}
 is neither a subtype nor a supertype of the combined inputs preceding it:
type_id: TFT_OPTIONAL
args {
  type_id: TFT_PRODUCT
  args {
    type_id: TFT_TENSOR
    args {
      type_id: TFT_FLOAT
    }
  }
}

	for Tuple type infernce function 0
	while inferring type of node 'cond_40/output/_23'
2023-09-16 15:01:26.370708: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:432] Loaded cuDNN version 8600

...

Node: 'sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits'
logits and labels must have the same first dimension, got logits shape [10,10] and labels shape [20]
	 [[{{node sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits}}]] [Op:__inference_train_function_886769]

someone555777 · September 16, 2023, 1:17pm

And small remark, that quoted code doesn’t work with ‘sparse_categorical_crossentropy’ too. Only with ‘categorical_crossentropy’

someone555777 · September 17, 2023, 6:25pm

I don’t know how, but my model magically begun successfully train. And even just with this layers

model = keras.Sequential([
    tf.keras.layers.Embedding(32767, 50),
    tf.keras.layers.Dense(len(contact_type)),
    tf.keras.layers.Softmax()
])

And it doesn’t work without Embeddings by the way. Can you say me, are any options omit Embeddings if I use batches with different length?

Topic		Replies	Views
How to tokenize data for NER NLP with Sequence Models week-module-3	1	393	September 23, 2023
Does only transformer need padding using max_length? Sequence Models coursera-platform	8	935	March 8, 2023
C3_W3_Lab_2_multiple_layer_LSTM Natural Language Processing in TensorFlow	1	344	October 14, 2022
Improving Training accuracy of LSTM in C3W4 assignment Natural Language Processing in TensorFlow week-module-4	6	393	August 3, 2023
Question C9: Unable to make model inference NLP with Attention Models week-module-2	1	580	October 24, 2022

How was achieved the different shapes fitting to model?

Related topics