The intent of the lab is to illustrate manual calculation of a BLEU score that is verified with a “sacrebleu” library result.
The sacreblue library reports 0.0 for the scores for the two tests illustrated.
0.0 and 0.0 do not compare well with the 27.6 and 35.3 calculated in steps 1-4 done in the lab.
The lab has failed in its basic premise.
The sacrebleu library has a number of defaults (separate tokenization of capitalized words, base n-gram length) and argument expectations (list of lists) that could lead to the 0 scores with the lab inputs.
Looks like the lab code using sacrebleu is incomplete and cannot produce a reasonable result.
Could you send me your lab code to check, because I could not replicate your results (my scores are not 0.0 and they compare well with sacrebleu) . Maybe you tinkered the code somewhere?
hi @arvyzukai
It is not the user side error, it is a bug in the lab itself.
The supplied lab does not correctly use the sacrebleu library.
The supplied lab does not indicate that the user should edit any of the erroneous cells.
This error has existed for some time as there are other posts describing it.
Interested readers can obviously look up the sacrebleu documentation, but that is not indicated in the notebook in any way.
Again, the intent of the lab expressed in the opening cells, is to illustrate that a manual example of calculating a BLUE score compares with the sacrebleu library.
It does not compare well at all, due to errors on the authoring (not the student) side of the notebook.