T5 Model Architecture

The “Transformer: T5” lecture video in C4W3 has a slide that shows an encoder/decoder, a language model, and prefix LM architectures. The video ends by saying that I now know what the T5 architecture looks like. How do the three architectures relate to one another? Are they all part of the T5 architecture? I found this very confusing. It wasn’t clear how the 3 diagrams related to the next slide.

1 Like

Hi @esav

I agree that this video is far from crystal clear. What I can offer is that:

  • the left side (“encoder/decoder”) illustrates traditional encoder-decoder architecture (non-causal on one input, causal on output, like in “Attention is all you need paper” in translation)
  • middle (somewhat ambiguously named “Language model”, the more clear name should have been “Causal LM”) illustrates the decoder-only architecture (like GPT style models)
  • the right side (“Prefix LM”) illustrates T5 architecture - here, a “prefix” separates two sections. Within one prefix section, any token can attend to any other token (non-causal, like in the traditional encoder). With another prefix section, decoding can attend to only itself and previous tokens, including the first prefix section (causal, like in the “Causal LM”).
    From the paper:

Let me know if that helps. Cheers

I see! So “causal with prefix” is the T5 architecture. Thanks!