Beam Search: P* <= P^hat

Sorry, but I am still confused about this.

I can get the case where P(y^*|x) > P(\hat{y}|x), but not so much the other way around.

I mean, granted, as Prof Ng. expresses there may be more than one y^*, even a set of acceptable human translations, but in the end aren’t these they very targets you are training your model (trying to optimize on) ?

So how on Earth does it ‘reverse’ itself (i.e. the predicted sentence is more probable than the correct translation) ?

I know he says ‘work on the RNN’, but does this mean ‘more training’, ‘more data’, or what ?

*Should also add, at least as to what we’ve seen so far, you cannot so easily just ‘add more layers’ at each time step, or at least not without a quite significant computational cost.

1 Like

P(y^*|x) \lt P(\hat{y}|x) says that human translation is considered worse than machine generated translation.

This means that the language model doesn’t have a good enough understanding of the source / target languages. Adding more relevant data and training a bigger model longer doesn’t hurt performance. Model training tips are already covered in parts 2 and 3 of the specialization.

1 Like

@balaji.ambresh yes, I understand the interpretation or what it is suggesting.

I guess then the translation then is that this ‘error detection’ method is more about determining where beam search itself fails.

And I am aware of the model training tips previously presented, but at best, Prof Ng. presents perhaps a two layer LSTM model in this class and suggests even that can be difficult to train-- Thus it struck me the situation seems a bit different here than say, ResNets where you can have like hundreds of layers.

So I wondered ‘Okay, how do we do this ?’

Beyond the obvious, I hope my question makes sense.

1 Like

The aim is to determine if the problem is with beam search or the underlying model. There’s no direct answer in terms of which recurrent network to use. Try NAS or a domain / task specific pre-trained transformer based on availability of resources.

1 Like