Question in chapter 5 week 3 "Attention Model Intuition

In the beginning of the video, there is a graph that represents the BLUE score in various sentence length.

And there is a low performance in BLUE socre at short sentence length.

I’m little confused about it, and why it is that low?

In naive enc-to-dec model, I think the longer the sentence length, the more compressed feature we used as the input activation of the decoder, so it becomes worse in performance.

But what about in short sentence length?

What does it mean “Short sentences are hard to get all the words”?

Thanks.

I reviewed the lecture, and I don’t understand why he added “short sentences are hard to get right”. Seems to me a short sentence would be easier.