The transformer architecture is notoriously inefficient when processing long sequences — a problem in processing images, which are essentially long sequences of pixels.
What’s new: Zizhao Zhang and colleagues at Google and Rutgers University simplified an earlier proposal for using transformers to process images. They call their architecture NesT.
Why it matters: Transformers typically bog down when processing images. NesT could help vision applications take fuller advantage of the architecture’s strengths.
We’re thinking: Computational efficiency for the Swin!
To read the full story, click here
Great summary on the latest news from the Batch. The NesT architecture and the computation that it is bringing is good news for the vision community.
Two of the other things I found interesting in this week’s Batch are Andrew’s notes on academia vs industry and the news about Multitask Unified Model(MUM) that Google plans to introduce in Google Search and Google Lens.
The comparison between academia and industry is definitely a must-read for anyone who would like to transition from one to another.
Here is the link to this week’s Batch to read more about academia and industry, NesT, Google MUM, and many other AI news.
We look forward to the next week’s newsletter!