Matrix size for every step

arvyzukai · July 31, 2023, 6:40am

Yes, you understand that correctly. In addition, there is usually a batch_size in front.

In other words, if we have [n_sentences, n_tokens_padded] input (n_sentences here is equivalent to batch_size), then the output from the embedding layer is [n_sentences, n_tokens_padded, embedding_size] (for example, (32, 64, 1024)). A simple example.

I’m not sure I understand. In general, you are the one who tells trax what size of each layer you want (and you are the one who has to make sure they are reasonable).

Yes, absolutely. Under the hood it is very similar to Dense (linear) layer, like you said in the first question - it takes n’th token (for example 54) and returns some vector (for example 1024 long row of numbers) which are updated according to the loss (during training).

Cheers

Topic		Replies	Views
Emojify exercise 4 embedding layer Sequence Models coursera-platform	14	1182	June 11, 2021
Embedding Layer input and output meaning Natural Language Processing in TensorFlow week-2 , week-3 , week-4	5	730	April 17, 2022
Support with C4W1 assignment - NLP with attention models NLP with Attention Models feedback , week-1	2	207	May 31, 2024
Dimension of weight matrices NLP with Attention Models week-1	1	489	December 5, 2022
C4W1 Assigment - Decoder part - Dimension problem NLP with Attention Models week-1	2	33	March 21, 2025

Matrix size for every step

Related topics