Just curious is it ok to not set an input length in an embedding layer? If a sentence from training data is very long, the dimension of the embedding layer’s weights would be large, could it cause slow computation or memory issue?
Also only embedding_dim=64 is provided, where does the 523840 Param number come from? Thanks!
About memory consumption and computation adding very long sentences will impact in those parameters, but not impacting computation that much as you can expect.
Param # is vocabulary size * embedding dim. Have you changed vocabulary size as well?
Why the maximum length is not provided in the output shape of the embedding layer?
I assume that using padded_batch method, the length should equal BATCH_SIZE, which is set to 64. Given that the embedding dimension is also 64, I expect the output shape to be something like this: (None, 64, 64), not (None, None, 64).