Is language translation in decoder only architecture possible?

I am now studying NLP in Pytorch for Deep Learning (module 3). In the course material, example given for encoder-decoder is for language translation. The course materials as the previous modules are excellent.

My question is that nowhere says that decoder only arhitecture can also succeed language translation. As far as I read from other sources, this is a bit surprised to reseracher initially, and actually GPT like transformer –only decoder– is also doing the translation.

My first question is that ? Is ChatGPT using encoder-decoder for the language translation or only decoder transformer? if only decoder transformer., my next question is that can any point a resource to understand how and why it works for language translation?

It is a mystery for me how you will jump from english to another language word by word prediction in decoder only transformer.

hi @yildirimga

GPT is a decoder only model which was created using the decoder part of the original Transformer model but transformer original modell (ATTENTION ALL YOU NEED). is an encoder-decoder model.

Transformer model was later used to create many different kind of models

Encoder only (BERT) process input text to understand its context and meaning, creating rich numerical representations (embeddings) as Bidirectional Encoder Representation from Transformer.Used in text classification or Named Entity Recognition(NER)

Decoder only (GPT, LLAMA) predicting the next word based on previous context. chat-GPT

Lastly Encoder-Decoder model (T5 and BART) -T5 (Text-to-Text Transfer Transformer) unites all tasks into a text-to-text format, making it highly flexible, while BART (Bidirectional and Auto-Regressive Transformers) generates and understands due to its denoising autoencoder pre-training, often outperforming T5 in tasks like summarization. Both are popular for sequence-to-sequence tasks, but most studies have show BART often yields slightly better summarization with higher ROUGE scores, while T5 offers unique strengths in text simplification.

For your translation query, how model suddenly switches from English to other language is based on two different techniques sequence to sequence task where an encoder reads the input sentence and creates a numerical representation (a vector), and a decoder uses that vector to generate the output sentence word by word. Another is Transformer which uses attention mechanism to weigh the importance of different words in a sentence, allowing them to handle long-range dependencies and produce highly accurate translations.

I don’t know if you have done NLP specialisation but if you are interested to know how these chatbot respond or want to know core understanding behind it, NLP is a great specialisation, followed by RAG techniques course which explains all the different nlp methods to create your own text to text or text–to–image generators.

regards

DP