Why are trigram and 4-gram giving better results than 5-gram in the week 3 assignment?

Why are trigram and 4-gram giving better results than 5-gram in the C2W3 assignment (at the end of the assignment after exercise 11)? Ideally the higher n-grams should give better results, right?

The more grams you use (and this is not a rule) depending on the kind of recurrent memory cells you use the context to remember may be large, spread and diminished over many words, so the right context could be lost.

1 Like