Is language translation in decoder only architecture possible?

yildirimga · January 2, 2026, 11:27pm

I am now studying NLP in Pytorch for Deep Learning (module 3). In the course material, example given for encoder-decoder is for language translation. The course materials as the previous modules are excellent.

My question is that nowhere says that decoder only arhitecture can also succeed language translation. As far as I read from other sources, this is a bit surprised to reseracher initially, and actually GPT like transformer –only decoder– is also doing the translation.

My first question is that ? Is ChatGPT using encoder-decoder for the language translation or only decoder transformer? if only decoder transformer., my next question is that can any point a resource to understand how and why it works for language translation?

It is a mystery for me how you will jump from english to another language word by word prediction in decoder only transformer.

Deepti_Prasad · January 3, 2026, 5:24am

hi @yildirimga

GPT is a decoder only model which was created using the decoder part of the original Transformer model but transformer original modell (ATTENTION ALL YOU NEED). is an encoder-decoder model.

Transformer model was later used to create many different kind of models

Encoder only (BERT) process input text to understand its context and meaning, creating rich numerical representations (embeddings) as Bidirectional Encoder Representation from Transformer.Used in text classification or Named Entity Recognition(NER)

Decoder only (GPT, LLAMA) predicting the next word based on previous context. chat-GPT

Lastly Encoder-Decoder model (T5 and BART) -T5 (Text-to-Text Transfer Transformer) unites all tasks into a text-to-text format, making it highly flexible, while BART (Bidirectional and Auto-Regressive Transformers) generates and understands due to its denoising autoencoder pre-training, often outperforming T5 in tasks like summarization. Both are popular for sequence-to-sequence tasks, but most studies have show BART often yields slightly better summarization with higher ROUGE scores, while T5 offers unique strengths in text simplification.

For your translation query, how model suddenly switches from English to other language is based on two different techniques sequence to sequence task where an encoder reads the input sentence and creates a numerical representation (a vector), and a decoder uses that vector to generate the output sentence word by word. Another is Transformer which uses attention mechanism to weigh the importance of different words in a sentence, allowing them to handle long-range dependencies and produce highly accurate translations.

I don’t know if you have done NLP specialisation but if you are interested to know how these chatbot respond or want to know core understanding behind it, NLP is a great specialisation, followed by RAG techniques course which explains all the different nlp methods to create your own text to text or text–to–image generators.

regards

DP

Topic		Replies	Views
Why use Encoder-Decoder Models? Generative AI with Large Language Models week-module-1	3	1880	February 1, 2024
Decoder only model vs encoder+decoder models Generative AI with Large Language Models week-module-1	1	762	July 27, 2023
If GPT is a decoder only model, why is it good at tasks other than text generation? Generative AI with Large Language Models week-module-1	3	1674	April 24, 2024
How does next word prediction work for language translation? Generative AI with Large Language Models week-module-1	1	862	July 20, 2023
If GPT is decoder only architecture, how do they do classification task and vice-versa? GenAI with LLMs Resources	2	1373	August 10, 2023

Is language translation in decoder only architecture possible?

Related topics