Why are the parameters & shape of Embedding layer different for subwords?

In lab 1&2 where we were tokenizing our own datasets, we passed the following parameters to the Embedding later when defining the model:
tf.keras.layers.Embedding(**vocab_size, embedding_dim, input_length=max_length)

However, when using pre-tokenized subwords, we are only passing 2 parameters into the Embedding layer:
tf.keras.layers.Embedding(tokenizer_subwords.vocab_size, embedding_dim)

Also, the dimension of the Embedding layer is different; it has 2 None:
embedding (Embedding) (None, None, 64) 523840

Can someone explain what’s happening and why this difference is between the 2 versions?
Thanks! :slight_smile:

Output shape of an embedding layer stands for (BATCH_SIZE, MAX_LENGTH_OF_INPUT_SENTENCE, EMBEDDING_DIM_PER_TOKEN). When we skip the input_length parameter, tensorflow can’t infer the maximum length of input sentence and hence the None in the 2nd dimension.

The exercises in general contain a tokeniner forllowed by a padding call which encodes words / subwords into integers and pads / truncates each row to the provided length. The advantage of providing input_length is that model summary is clear. The drawback of specifying the input_length parameter is that data should always have the same shape. For instance, specifying input_length as 120 means that all training / testing data should be of the form (BATCH_SIZE, 120) to the embedding layer. The means that if there was a big batch size, you’ll needlessly pad smaller sentences to the maximum length. On the other hand, if you don’t provide the input_length parameter, you can pad each batch to the maximum sentence length for that batch and hence is more efficient.

1 Like