Beam Search at Training Time

Hi, I need one clarification regarding the application of beam search in case of machine translation task. Do we apply beam search at the time of training as well? Or do we just take the top word after applying softmax and compute loss for that step and then teacher force the correct output to the next step’s input?


It depends on your implementation. You can have a look at this discussion on stackoverflow.