How was achieved the different shapes fitting to model?

I am trying to rewrite the code from C3_W3_Assignment to tensroflow and I would like to know how you achieved, that generated data with different padding length from batch to batch was fitted to model?

I have something like this

model = keras.Sequential([
    tf.keras.layers.Embedding(32767, 50),
    tf.keras.layers.LSTM(50),
    tf.keras.layers.Dense(10),
    tf.keras.layers.Softmax()
])

model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy'])

model.fit(tf.data.Dataset.zip(test_sentences_vec, test_tags_vec).padded_batch(10), epochs=3)

Outputs

Node: 'sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits'
logits and labels must have the same first dimension, got logits shape [10,10] and labels shape [20]
	 [[{{node sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits}}]] [Op:__inference_train_function_896421]

And this is what in data

>>> list(tf.data.Dataset.zip(test_sentences_vec, test_tags_vec).padded_batch(10).as_numpy_iterator())

[(array([[   1,    0],
         [2151,    0],
         [   1,    0],
         [   1,    0],
         [   1,    0],
         [   1,    0],
         [2652,    1],
         [   1,    0],
         [   1,    0],
         [   1,    0]]),
  array([[ 1128,     0],
         [    1,     0],
         [   47,     0],
         [   36,     0],
         [   52,     0],
         [   52,     0],
         [   32,    32],
         [   45,     0],
         [   45,     0],
         [10206,     0]])),

(array([[    1,     0,     0,     0,     0,     0,     0,     0],
         [    1,     0,     0,     0,     0,     0,     0,     0],
         [    1, 16688,  3431,  1032,  2201,     1,  1115,     1],
         [11009,     1,     0,     0,     0,     0,     0,     0],
         [    1,  1059,     0,     0,     0,     0,     0,     0],
         [    1,     0,     0,     0,     0,     0,     0,     0],
         [    1,     0,     0,     0,     0,     0,     0,     0],
         [18680,     0,     0,     0,     0,     0,     0,     0],
         [ 2084,     0,     0,     0,     0,     0,     0,     0],
         [    1,     0,     0,     0,     0,     0,     0,     0]]),
  array([[   52,     0,     0,     0,     0,     0,     0,     0],
         [   52,     0,     0,     0,     0,     0,     0,     0],
         [   32,    32,    32,    32,    32,    32,    32,    32],
         [   45,    45,     0,     0,     0,     0,     0,     0],
         [   45,    45,     0,     0,     0,     0,     0,     0],
         [10206,     0,     0,     0,     0,     0,     0,     0],
         [ 1128,     0,     0,     0,     0,     0,     0,     0],
         [ 6235,     0,     0,     0,     0,     0,     0,     0],
         [    1,     0,     0,     0,     0,     0,     0,     0],
         [   47,     0,     0,     0,     0,     0,     0,     0]])),

 (array([[    1,     0,     0,     0,     0,     0],
         [20047,     0,     0,     0,     0,     0],
         [ 2820,     0,     0,     0,     0,     0],
         [    1,     0,     0,     0,     0,     0],
         [    1,     0,     0,     0,     0,     0],
         [    1,     0,     0,     0,     0,     0],
         [    1,     0,     0,     0,     0,     0],
         [ 2331,  4945,  1882,     1,  1536,     1],
         [13770, 13773, 24500,     0,     0,     0],
         [    1, 17789,  1012,     1,     0,     0]]),
  array([[1128,    0,    0,    0,    0,    0],
         [6235,    0,    0,    0,    0,    0],
         [   1,    0,    0,    0,    0,    0],
         [  47,    0,    0,    0,    0,    0],
         [  36,    0,    0,    0,    0,    0],
         [  52,    0,    0,    0,    0,    0],
         [  52,    0,    0,    0,    0,    0],
         [  32,   32,   32,   32,   32,   32],
         [  45,   45,   45,    0,    0,    0],
         [  45,   45,   45,   45,    0,    0]])),

The main problem is that tf.keras.layers.Dense(10) likes to take just fixed length of element (10 — that is number of all existing tags) in batch. If I do .padded_batch(10, padded_shapes=([10], [len(10])) everything works.

But I checked in your lab and next(train_generator)[0].shape really outputs different shapes from batch to batch. So, how was achieved, that your model is working fine if there are different shapes from batch to batch?

Hello @someone555777 ,
To handle data with different padding lengths from batch to batch, you can use the Masking layer in your model. The Masking layer is used to ignore the padded values during the training process, allowing the model to work with variable-length input sequences.

Here’s how you can modify your model to include the Masking layer:

import tensorflow as tf
from tensorflow import keras

model = keras.Sequential([
    tf.keras.layers.Embedding(32767, 50),
    tf.keras.layers.Masking(mask_value=0),  # Add the Masking layer
    tf.keras.layers.LSTM(50),
    tf.keras.layers.Dense(10),
    tf.keras.layers.Softmax()
])

model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

model.fit(tf.data.Dataset.zip(test_sentences_vec, test_tags_vec).padded_batch(10), epochs=3)

In this example, the Masking layer is added after the Embedding layer. The mask_value parameter is set to 0, which means that the layer will ignore any input values equal to 0 during training. This is useful when you have padded your input sequences with zeros to make them all the same length.

By adding the Masking layer, your model can now handle input data with different padding lengths from batch to batch, as it will ignore the padded values during training.

I hope I was able to give an answer to your question. Please ask a followup question if you have further questions.
Regards,
Can Koz

Wow! Cool. I wouldn’t have guessed for a long time. Will test, thanks.

But I don’t see that we used the masking before training in initial lab. Is it something like Trax specific?

so, I changed

model = keras.Sequential([
    tf.keras.layers.Embedding(32767, 50),
    tf.keras.layers.Masking(mask_value=0),
    tf.keras.layers.LSTM(50),
    tf.keras.layers.Dense(len(contact_type)),
    tf.keras.layers.Softmax()
])

model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy'])

model.fit(tf.data.Dataset.zip(test_sentences_vec, test_tags_vec).padded_batch(10), epochs=3)

And still get the error

2023-09-16 15:01:23.670102: W tensorflow/core/common_runtime/type_inference.cc:339] Type inference failed. This indicates an invalid graph that escaped type checking. Error message: INVALID_ARGUMENT: expected compatible input types, but input 1:
type_id: TFT_OPTIONAL
args {
  type_id: TFT_PRODUCT
  args {
    type_id: TFT_TENSOR
    args {
      type_id: TFT_INT32
    }
  }
}
 is neither a subtype nor a supertype of the combined inputs preceding it:
type_id: TFT_OPTIONAL
args {
  type_id: TFT_PRODUCT
  args {
    type_id: TFT_TENSOR
    args {
      type_id: TFT_FLOAT
    }
  }
}

	for Tuple type infernce function 0
	while inferring type of node 'cond_40/output/_23'
2023-09-16 15:01:26.370708: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:432] Loaded cuDNN version 8600

...

Node: 'sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits'
logits and labels must have the same first dimension, got logits shape [10,10] and labels shape [20]
	 [[{{node sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits}}]] [Op:__inference_train_function_886769]

And small remark, that quoted code doesn’t work with ‘sparse_categorical_crossentropy’ too. Only with ‘categorical_crossentropy’

I don’t know how, but my model magically begun successfully train. And even just with this layers

model = keras.Sequential([
    tf.keras.layers.Embedding(32767, 50),
    tf.keras.layers.Dense(len(contact_type)),
    tf.keras.layers.Softmax()
])

And it doesn’t work without Embeddings by the way. Can you say me, are any options omit Embeddings if I use batches with different length?