Number of LSTM units in Trax

Thank you @arvyzukai. That is very helpful.

With respect to your example. the shape of the embedded sentence is (8,5), where 8=number of words in sentence including PADs and 5=length of embedding.

In your example, the 8 embedded words (one token at a time) EACH go through the LSTM, so shouldn’t the number of LSTM units be 8? Whereas the number of units based on embedding size would be 5? If it is 5, then how do each of the 8 words get to go through the LSTM to produce an output based on the previous word?