In EncoderLayer, x in the call method is defined to have shape (batch_size, input_seq_len, fully_connected_dim). However, upon printing fully_connected_dim I found that it’s equal to 8 when the shape of x is (1,3,4). I also found that embedding_dim is 4.
Should all the comments (in both EncoderLayer and Encoder) that say (batch_size, input_seq_len, fully_connected_dim) be replaced with(batch_size, input_seq_len, embedding_dim)? Otherwise, I’m not sure how the shapes of the layer call outputs line up (i.e. self.ffn(out1)'s output should have shape (None, input_seq_len, embedding_dim).