What about Reformer? Can this architecture help with the scaling issue?

Hello @ajeancharles , as I can see about the Reformer paper, it seems to address exactly what you mention. However, I could not find broad adoption of this technique in the recent literature. Without having tried Reformer myself, I can only guess that there are good reasons why this technique is not used broadly nowadays. The rest of techniques explained in the course are already de-facto standards within the few time they exist.
Reformer used Locality Sensitive Hashing (LSH) and reversible residual layers but even by 2020 other models were outperforming it. Long Range Arena

