The comment in the “call” function of class EncoderLayer says:
x – Tensor of shape (batch_size, input_seq_len, fully_connected_dim)
Shouldn’t the last dimension be “embedding_dim” considering that the input to the encorder layer are word embedding vectors?