How does next word prediction work for language translation?

QihangHuang · July 20, 2023, 10:52pm

Do we need to have 2 different LLM for language translation vs generative task? If we are using the same general LLM for next word prediction of “I love machine”, how does the system know to translate this sentence to a different language (e.g French), or to predict the next word in English according to current context (e.g next word prediction to be learning, making the full sentence to “I love machine learning”)?

I have this question when watching the video regarding Transformer architecture when you feed the initial word to the input of Decoder.

Juan_Olano · July 20, 2023, 11:30pm

This is an interesting question @QihangHuang.

I would like to split the answer in two:

Answer 1: Seq-to-Seq models
These models are built with the 2 parts of a transformer: an encoder and a decoder. In this case, the encoder takes the source sentences and produces a vector that contains the semantics of the sentence. This vector is passed to the decoder where, based on the semantic vector plus the patterns learned in training, it starts predicting one word at a time. After the 1st word, all other words have as input the semantic vector from the encoder plus all the words predicted so far. And this is how it works for Seq-to-Seq models.

Answer 2: Decoder-only models (like GPT - ChatGPT)
These work differently. If you just say “I love machines” it will add more text in english. To have it translate, you have to start the prompt with “Please translate to French the following sentence: I love machines”. In this case, the decoder is 100% “guessing” the next word, and the next and the next. Again, the reason why it works is in part thanks to the huge huge amount of data with which it was trained, which allows it to follow patterns and be good at predicting very well the next word, and second, and most interesting, it works due to reasons that are still unknown.

Please share thoughts, more questions, or comments!

Fascinating topic.

Topic		Replies	Views
Decoder only model vs encoder+decoder models Generative AI with Large Language Models week-module-1	1	739	July 27, 2023
Sequence to sequence vs autoregressive models Generative AI with Large Language Models week-module-1	3	1213	July 18, 2023
Why use Encoder-Decoder Models? Generative AI with Large Language Models week-module-1	3	1681	February 1, 2024
What does seq2seq mean in Transformer? Generative AI with Large Language Models week-module-1 , week-module-2	2	438	April 23, 2024
Inference for NMT NLP with Attention Models week-module-2	11	433	June 23, 2023

How does next word prediction work for language translation?

Related topics