In the benchmarks, when we see ROUGE scores (ROUGE-1, ROUGE-2, ROUGE-L), what is the actual underlying score? Recall or Precision or F1 or some other transformation of some combination of these values?
This is an article that explains each one of them and their respective formulas with regards to precision, recall and F1. To go straight to the endpoint:
Mean of the F1 scores will gives us the full ROUGE-1 score for dataset. (similar to ROUGE-2…)