Error in video's ROUGE-1 `precision` calculation

brec · February 9, 2025, 6:20pm

At 6:32 in the video, when the reference sentence is It is cold outside.:

Take, for example, this generated output,
cold, cold, cold, cold.

As this generated output contains one of the words from the reference sentence, it will score quite highly even though the same word is repeated multiple times.[emphasis added]

The Rouge-1 precision score will be perfect.

The precision score calculation shown on the whiteboard in the video has 4 as the number of unigram matches, ignoring, as indicated by the transcript, the fact of repetition. But repetitions should not be counted; what should be counted are unique instances. The number of matches should thus be 1, and the precision score should be 1/4. See Google Research’s Python code at

google-research/rouge/rouge_scorer.py at master · google-research/google-research · GitHub

The number of matches, in pseudo-code, is:

for each unigram in reference:
   matches += min(reference instances, output instances)

Igor_Pereverzev · February 10, 2025, 11:19am

If correct understand there should be:

ROUGE-1 Recall: 0.25 (25%)
Precision: 0.25 (25%)

brec · February 10, 2025, 1:16pm

Yes, the same correction should apply to Recall.

Topic		Replies	Views
C4_W1rouge1_similarity NLP with Attention Models week-module-1	5	416	September 4, 2024
C4W1-Exercise 6 - rouge1_similarity NLP with Attention Models week-module-1	1	36	September 4, 2024
UNQ_C8 - Rouge similarity NLP with Attention Models week-module-1	2	663	May 21, 2022
ROUGE-L Calculation in the lecture : "Model Evaluation" of Week-2 Generative AI with Large Language Models week-module-2	4	523	December 15, 2023
ROUGE score - what does it refer to - Recall / Precision / F1 or something else? Generative AI with Large Language Models week-module-2	4	440	August 9, 2024

Error in video's ROUGE-1 `precision` calculation

Related topics