I think you misunderstand what are the word embeddings because it’s not about “deep(GRU, LSTM) n-grams against word embeddings”.
Here are some threads about word embeddings:
- Intuition behind using the weights of a CBOW model as word embeddings
- How does trax word embedding layer work?
- Embedding layer, why is it needed?
And also some threads about RNNs:
Cheers