Is this true: "Encoder-only models are also known as autoencoding models"?

The lecturer said: “Encoder-only models are also known as autoencoding models.”

But, Wikipedia says that “An autoencoder learns two functions: an encoding function that transforms the input data, and a decoding function

So, does autoencoder have a decoding function? The video and Wiki are saying opposite things. I’m very confused.

Video can be found in Week 1 “Pre-training large language models” @ time 3:43.

Valid point. Encoder-only models are exactly that: Encoder. They don’t contain the ‘decoder’ part of the transformer.

Autoencoders do have an encoder but also, as you point out, they also have a decoding functions.

1 Like

Thanks, Juan. Hopefully, that part of the video will be updated in the next iteration of the course.

I’ve sent a signal to the monitors - lets wait and see how this evolves. Thanks!

1 Like

Hello @myusername,

I think we can’t mix up the two sets of terminology here.

In the wikipedia’s autoencoder, it indeed has an encoder and a decoder, and a perfect decoder is expected to reproduce the input sentence.

Source: that wikipedia’s page.

In a transformer encoder, it alone is expected to reproduce the input sentence.
Source: the BERT paper.

They just use the same english word “encoder”, but they have different meaning in their contexts. You may say that the transformer encoder is called the “transformer encoder” because there exists an option of “transformer decoder” downstream, and we need to cope with the naming conventions.


  • Encoder-only models and autoencoders are different concepts.
  • Encoder-only models do not have a decoder and produce a fixed-size representation from input.
  • Autoencoders consist of both an encoder and a decoder.
  • Encoder-only models are used when only the encoding part is needed, while autoencoders are used for unsupervised learning tasks to compress and reconstruct data.

This is how autoencoders are trained, with both: an encoder and a decoder.

However dependent on the use case autoencoders can also be used only with the encoder in inference mode, e.g.: when it comes to dimensionality reduction for further downstream tasks: here the trained encoder can take care of feature extraction into a low dimensional space resp. efficient compression of highly dimensional data. Usually these data are after dimensionality reduction processed further e.g. w/ clustering, classification or visualization using the efficient latent space representation.

Best regards