Beam Search: Words chosen

nadtriana · September 19, 2024, 1:12pm

Your understanding is correct! To complement that, using teacher forcing during training helps the model learn the correct sequence. This means that during inference, the model is better at predicting the next word in the sequence when given a coded context. However, during inference, the predicted word is used as input for the next step instead of the correct target (because you don’t have the ground truth). This can lead to errors, which is why beam search is used to explore multiple possibilities instead of just one.

After selecting the top-N words in the first step, each is used as a new starting point to predict the next word. The decoder will consider the context (the hidden states) along with the first word to generate a probability distribution for the second word, and so on. Each partial sequence is scored by its cumulative probability. The search continues until the desired sequence length or an end token is reached.

In the French-to-English example, after selecting the top three words for the first position (“in,” “Jane,” “September”), beam search would predict the next word for each candidate by considering the context of both the input sentence and the chosen first word. As the sequence progresses, it keeps track of the most probable translations, discarding unlikely candidates.

Topic		Replies	Views
Sequence Model: Encoding in Beam Search Sequence Models week-module-3 , coursera-platform	3	94	May 23, 2024
Beam Search Error Analysis 2 questions Sequence Models coursera-platform	9	577	May 10, 2023
Greddy Search Clarification Sequence Models coursera-platform	1	552	April 9, 2022
Week-3 Sequence Model: Doubt in Greedy Search Vs Beam Search Clarification Sequence Models week-module-3 , ai-discussions , coursera-platform	2	326	January 8, 2024
Refinements to Beam Search Sequence Models coursera-platform	2	510	October 26, 2022

Beam Search: Words chosen

Related topics