Refinements to Beam Search

Hi Sir,

From the below lecture notes, we are having doubts can you please help to clarify ?

**Just to wrap up how you can beam search, as you run beam search you see a lot of sentences with length equal 1, length sentences were equal 2, length sentence that equals 3 and so on, **
and maybe you run beam search for 30 steps you consider, output sentences up to 30, let’s say beam width three, you would be keeping track of the top three possibilities for each of these possible sentence length 1, 2, 3, 4, and so on up to 30 and store them against this score and so you can take your top sentences and just computes this objective function on the sentences that you have seen through the beam search process. Then finally, of all these sentences that you evaluate this way, you pick the one to the achieves, the highest value on this normalize log probability objective, sometimes it’s called a normalized log likelihood objective and then that would be the final translation you all output.

Our doubt is, Actually , after each step we would end up with 30,000 possilble outputs (if 10,000 softmax output & beam width = 3) so we pick top 3 output from among 30,000 outputs. Here doubt is, after picking top 3 sentences only we need to compute againt objective function or for all the 30,000 outputs we need to compute the objective log function then pick the top 3 sentences ?

Use length normalization to pick the best output AFTER candidate generation (i.e. after running beam search and finding top 3 till 30 steps).
Using length normalization during candidate generation won’t make a difference since you’d be comparing sentences of equal length at each branch.
Purpose of using length normalization is to ensure that you don’t punish longer sentences since the product of probabilities would be small.

1 Like

Using length normalization during candidate generation won’t make a difference since you’d be comparing sentences of equal length at each branch.