Why are trigram and 4-gram giving better results than 5-gram in the C2W3 assignment (at the end of the assignment after exercise 11)? Ideally the higher n-grams should give better results, right?
The more grams you use (and this is not a rule) depending on the kind of recurrent memory cells you use the context to remember may be large, spread and diminished over many words, so the right context could be lost.
1 Like