Question about Dimension of Model Input

YIHUI · August 9, 2022, 10:25pm

From the data generator, training data is a set of batches and each batches consists of multiple sentences.

However, when the training data fits into the model, does the classifier below fits all the words in one sentence all at once and then take the mean of all the words for the final layer, since there is only one value target for each sentence? i.e. for each sentence, the output of embedding layer is (seq_length, embed_dim)?

tl.Mean here takes axis = 1 seems not correct to me since we want the average of each attributes across seq_length of words, so I think axis = 0 should be correct. However, axis=0 will generate errors but axis=1 will not, why?

arvyzukai · August 10, 2022, 7:23am

Hi @YIHUI

Since we are training in batches, the input to the mean layer is (batch_size, seq_length, embed_dim), so axis 1 is seq_length.

P.S. please remove your solution code since it’s against the rules

Topic		Replies	Views
W4 Assignment 1 Exercise 8 Are the input dimensions of our transformer model correct Sequence Models week-4	2	249	January 9, 2024
C3 W1 Assignment Model intuition NLP with Sequence Models week-1	1	507	December 29, 2022
Trax and mean layer NLP with Sequence Models week-1	4	574	December 3, 2022
NLP Specialization, C3 W1, Exercise 5 NLP with Sequence Models week-1	2	452	July 7, 2023
Subtle, confusing errors in C3W1 notebook explanation of Mean NLP with Sequence Models week-1	2	487	March 28, 2023

Question about Dimension of Model Input

Related topics