The initial equation is Z = WX + b, so W has dimension of Nh, where is N is the size of vocabulary and h is the size of embedding. In trax at one point in the assignment, the dense layer becomes x * w, which changes the dimension of w. But the text did not explicitly define the dimension of w. Later in the mean layer, the mean is on axis=1, and it requires average across N, which means now w is of dimension (h, N). Can someone please help that I understand this correctly. Thanks.
Hi Xixi_NXCR,
In case you still have this question: As far as I can see, you are correct in your understanding.
I am trying to understand this as well. Is the output dimension of the mean layer (h, 1)? Where h is the embedding size. This would mean that we get for each word in our vocabulary the mean of all its embedding values of that word that have been altered from the weights and bias from passing through the embedding layer. Is that correct?
Hi Remington_Lambie,
As I understand it, the output dimension of the embedding layer is (batch_size, vocab_size, embedding_dim). Mean takes the average over the vocab_size (axis =1), so you end up with (batch_size, embedding_dim).
1 Like
Thanks, this helped me