Transformer architecture is smarter than you think

LauraUstariz · July 28, 2021, 10:02pm

Hey Everyone!

On the latest issue of The Batch you can find the incredible latest developments on transformer architecture.

What’s new: Kevin Lu and colleagues at UC Berkeley, Facebook, and Google devised Frozen Pretrained Transformer (FPT). After pretraining a transformer network on language data, they showed that it could perform vision, mathematical, and logical tasks without fine-tuning its core layers.

Why it matters: It appears that similar information structures — in the authors’ term, grammars — pervade the world. Applying representations learned in one domain to another domain may conserve training time and lead to better multimodal models.

We’re thinking: It’s surprising that cross-modal pretraining works this well! Are there underlying statistics, common to many types of sequences, that we don’t yet appreciate?

What do you think comes next?

To read the full story, click HERE

Topic		Replies	Views
Transformers in Vision AI Discussions	3	76	May 3, 2022
Vision and Language Tightly Bound: Training on a single loss function improves multimiodal AI AI Discussions the-batch , ai-discussions	2	75	March 16, 2023
Transformer Speed-Up Sped Up AI Discussions the-batch , ai-discussions	1	85	October 15, 2021
Glad to be here! Generative AI with Large Language Models week-module-1	1	426	July 25, 2023
Seeing What Comes Next: Transformers predict future video frames AI Discussions the-batch , ai-discussions	1	87	May 20, 2023

Transformer architecture is smarter than you think

Related topics