Bug C4_W1_Ungraded_Lab_3_Bleu_Score Incorrectly Implemented

Gregory314159 · October 11, 2022, 7:26pm

The intent of the lab is to illustrate manual calculation of a BLEU score that is verified with a “sacrebleu” library result.
The sacreblue library reports 0.0 for the scores for the two tests illustrated.
0.0 and 0.0 do not compare well with the 27.6 and 35.3 calculated in steps 1-4 done in the lab.
The lab has failed in its basic premise.

The sacrebleu library has a number of defaults (separate tokenization of capitalized words, base n-gram length) and argument expectations (list of lists) that could lead to the 0 scores with the lab inputs.
Looks like the lab code using sacrebleu is incomplete and cannot produce a reasonable result.

arvyzukai · October 12, 2022, 5:51am

Hi @Gregory314159

Could you send me your lab code to check, because I could not replicate your results (my scores are not 0.0 and they compare well with sacrebleu) . Maybe you tinkered the code somewhere?

Thomas1 · October 27, 2022, 11:15am

The values are still 0
print(

"Results reference versus candidate 1 our own BLEU implementation: ",

round(bleu_score(tokenized_corpus_cand, tokenized_corpus_ref) * 100, 1),

)

Results reference versus candidate 1 our own BLEU implementation: 43.6
print(

"Results reference versus candidate 1 sacrebleu library BLEU: ",

round(sacrebleu.corpus_bleu(wmt19_can_1, wmt19_ref_1).score, 1),

)

Results reference versus candidate 1 sacrebleu library BLEU: 0.0

arvyzukai · October 27, 2022, 11:23am

Hi @Thomas1

Can you send me your notebook for me to check?

Thomas1 · October 27, 2022, 12:51pm

C4_W1_Ungraded_Lab_3_Bleu_Score.ipynb (54.5 KB)
Sure

Gregory314159 · October 27, 2022, 1:53pm

hi @arvyzukai
It is not the user side error, it is a bug in the lab itself.

The supplied lab does not correctly use the sacrebleu library.
The supplied lab does not indicate that the user should edit any of the erroneous cells.

This error has existed for some time as there are other posts describing it.

Interested readers can obviously look up the sacrebleu documentation, but that is not indicated in the notebook in any way.

Again, the intent of the lab expressed in the opening cells, is to illustrate that a manual example of calculating a BLUE score compares with the sacrebleu library.
It does not compare well at all, due to errors on the authoring (not the student) side of the notebook.

Sacre bleu! What a mess. GitHub - mjpost/sacrebleu: Reference BLEU implementation that auto-downloads test sets and reports a version string to facilitate cross-lab comparisons
nltk BLUE score calculation NLTK :: nltk.translate.bleu_score

arvyzukai · October 27, 2022, 3:54pm

in Step 5 try:

print(
    "Results reference versus candidate 1 sacrebleu library sentence BLEU: ",
    #round(sacrebleu.corpus_bleu(candidate_1, reference).score, 1),
    round(sacrebleu.sentence_bleu(candidate_1, [reference]).score, 1),
)
print(
    "Results reference versus candidate 2 sacrebleu library sentence BLEU: ",
    #round(sacrebleu.corpus_bleu(candidate_2, reference).score, 1),
    round(sacrebleu.sentence_bleu(candidate_2, [reference]).score, 1),
)

Results reference versus candidate 1 sacrebleu library sentence BLEU: 27.6
Results reference versus candidate 2 sacrebleu library sentence BLEU: 35.3

arvyzukai · October 27, 2022, 4:06pm

hi @Gregory314159

Thank you for noting this bug. I submitted the issue and it will be fixed as soon as possible.

Thanks

Topic		Replies	Views
Sacrebleu = 0 (BLEU) score: Ungraded Lab NLP with Attention Models week-1	2	584	December 28, 2021
Sacrebleu scoring method always returns score of 0? NLP with Attention Models week-1	1	527	August 22, 2022
Ungraded Lab : BLEU score NLP with Attention Models week-1	5	162	April 18, 2024
Graded 0/100 on Course 4 Week 2 Assignment 1 Convolutional Neural Networks coursera-platform	6	650	May 24, 2021
Trouble grade In week 3 course 1 NLP with Classification and Vector Spaces week-3	41	778	August 31, 2022

Bug C4_W1_Ungraded_Lab_3_Bleu_Score Incorrectly Implemented

Related topics