Why are the parameters & shape of Embedding layer different for subwords?

ankitprashnani · February 4, 2023, 10:46am

In lab 1&2 where we were tokenizing our own datasets, we passed the following parameters to the Embedding later when defining the model:
tf.keras.layers.Embedding(**vocab_size, embedding_dim, input_length=max_length)

However, when using pre-tokenized subwords, we are only passing 2 parameters into the Embedding layer:
tf.keras.layers.Embedding(tokenizer_subwords.vocab_size, embedding_dim)

Also, the dimension of the Embedding layer is different; it has 2 None:
embedding (Embedding) (None, None, 64) 523840

Can someone explain what’s happening and why this difference is between the 2 versions?
Thanks!

balaji.ambresh · February 4, 2023, 3:07pm

Output shape of an embedding layer stands for (BATCH_SIZE, MAX_LENGTH_OF_INPUT_SENTENCE, EMBEDDING_DIM_PER_TOKEN). When we skip the input_length parameter, tensorflow can’t infer the maximum length of input sentence and hence the None in the 2nd dimension.

The exercises in general contain a tokeniner forllowed by a padding call which encodes words / subwords into integers and pads / truncates each row to the provided length. The advantage of providing input_length is that model summary is clear. The drawback of specifying the input_length parameter is that data should always have the same shape. For instance, specifying input_length as 120 means that all training / testing data should be of the form (BATCH_SIZE, 120) to the embedding layer. The means that if there was a big batch size, you’ll needlessly pad smaller sentences to the maximum length. On the other hand, if you don’t provide the input_length parameter, you can pad each batch to the maximum sentence length for that batch and hence is more efficient.

Topic		Replies	Views
No input length in Lab 3 Embedding layer Natural Language Processing in TensorFlow week-2 , week-3 , week-4	4	598	May 2, 2022
Embedding Layer input and output meaning Natural Language Processing in TensorFlow week-2 , week-3 , week-4	5	724	April 17, 2022
How to decide optimal values of hyperparameters for embedding layer(output vector dimension and max length) based on data? Natural Language Processing in TensorFlow week-2 , week-3 , week-4	2	405	October 1, 2023
W4 Assignment 1 Exercise 8 Are the input dimensions of our transformer model correct Sequence Models week-4	2	249	January 9, 2024
Wrong comments in the assignment of C4W2 NLP with Attention Models general	3	77	June 19, 2024

Why are the parameters & shape of Embedding layer different for subwords?

Related topics