Hi Sir,
From the below lecture notes, we are having doubts can you please help to clarify ?
**Just to wrap up how you can beam search, as you run beam search you see a lot of sentences with length equal 1, length sentences were equal 2, length sentence that equals 3 and so on, **
and maybe you run beam search for 30 steps you consider, output sentences up to 30, let’s say beam width three, you would be keeping track of the top three possibilities for each of these possible sentence length 1, 2, 3, 4, and so on up to 30 and store them against this score and so you can take your top sentences and just computes this objective function on the sentences that you have seen through the beam search process. Then finally, of all these sentences that you evaluate this way, you pick the one to the achieves, the highest value on this normalize log probability objective, sometimes it’s called a normalized log likelihood objective and then that would be the final translation you all output.
Our doubt is, Actually , after each step we would end up with 30,000 possilble outputs (if 10,000 softmax output & beam width = 3) so we pick top 3 output from among 30,000 outputs. Here doubt is, after picking top 3 sentences only we need to compute againt objective function or for all the 30,000 outputs we need to compute the objective log function then pick the top 3 sentences ?