ROUGE and BLEU metrics

isg · July 11, 2023, 4:18pm

Hi,

According to the video on model evaluation, at the very end, it’s said that we can use the ROUGE score to evaluate summarization models and BLEU for translation tasks. However, for translation tasks, we will never have the same tokens in the completion as in the prompt by definition. Therefore, how we can evaluate the model performance using n-grams if the n-grams will be different in each language?

Thank you!

Juan_Olano · July 11, 2023, 4:31pm

As far as I understand, BLEU will take the output and compare it with a human reference (or a reference defined by the architect). So it doesn’t matter that the input and output n-grams are different.

AmmishTandon · July 11, 2023, 5:15pm

We aren’t comparing the tokens in the prompt and the completion. We are comparing the generated completion with the reference (human) text (completion). Hence, sentences in french will be compared with sentences in french only and not english.

isg · July 12, 2023, 7:28am

Oh, see. I was confused.

jenniferh · September 18, 2024, 6:00pm

I have a follow-up question with this regard. Since both ROUGE and BLUE are comparing the generated completion with the human reference, why in the lecture it is said that one is more fitting for text summarization (ROGUE) and the other for text translation (BLEU)? As both are proxies of how similar is the model-generated completion is to a human reference, not to the original input.
Thanks!

Topic		Replies	Views
Advanced LLM evaluation techniques Generative AI with Large Language Models week-2	2	738	July 28, 2023
Are BLEU and ROUGE intrinsic or extrinsic evaluation measures? NLP with Attention Models week-1	4	732	April 11, 2023
Week 2 - Fine-tuning and LLM Evaluation in practice Generative AI with Large Language Models week-2	1	429	July 27, 2023
What is a good Rouge/BLEU score Generative AI with Large Language Models week-2	2	874	July 4, 2023
Bleu and rouge scores NLP with Attention Models week-1	2	626	January 16, 2023

ROUGE and BLEU metrics

Related topics