What about Reformer, can this architecture help?

What about Reformer? Can this architecture help with the scaling issue?

Hello @ajeancharles , as I can see about the Reformer paper, it seems to address exactly what you mention. However, I could not find broad adoption of this technique in the recent literature. Without having tried Reformer myself, I can only guess that there are good reasons why this technique is not used broadly nowadays. The rest of techniques explained in the course are already de-facto standards within the few time they exist.
If you think I am wrong, please let me know.
Does it make sense?


Reformer used Locality Sensitive Hashing (LSH) and reversible residual layers but even by 2020 other models were outperforming it. Long Range Arena

Thank you! Good to know.