What is the value about learning RNNs in the world of LLMs?

This thread might be interesting for you as well since it’s about the way from RNNs to the transformer architecture, referring to highly relevant and popular papers:

Best regards
Christian