In this week’s lecture, the instructor mentioned that “It is cold” and “It is not cold” differ by only one word, but very different in terms of the meaning. I could not find any solution to this problem in the lecture.
Another example in the lecture, “I like drinking coffee” and “I abhor shipping coffee”, their similarity does not seem to be evaluated properly by ROUGE or BLEU discussed in the lecture.
Is there some remedy to these evaluation issues, or do we admit them because such in practice (because automatic evaluation is difficult)?