C4W1 - UNQ_10 - How to debug choice between valid translations

Steven1 · November 28, 2022, 1:51am

When I ran the final unit test on UNQ10:

# UNIT TEST
# test mbr_decode
w1_unittest.test_mbr_decode(target=mbr_decode, score_fn=average_overlap, similarity_fn=rouge1_similarity)

I got the following result:

Expected output does not match
 3  Tests passed
 1  Tests failed

So, I added various print stmts to my code in order to better understand where the problem lay. The results I got were as follows:

+++++++++++++++++++++++++++++++++++++++++++++++
sentence I eat soup.
n_samples 4
temperature 0.6
---> 0 -0.0003108978271484375 0.999689150496573 Ich iss Suppe.
---> 1 -0.0003108978271484375 0.999689150496573 Ich iss Suppe.
---> 2 -0.000225067138671875 0.9997749581870365 Ich esse Schweine.
---> 3 -0.000110626220703125 0.9998893798981516 Ich esse Suppe.

 3 -0.000110626220703125
translated_sentence Ich esse Suppe. 0.9998893798981516
+++++++++++++++++++++++++++++++++++++++++++++++
Expected output does not match
+++++++++++++++++++++++++++++++++++++++++++++++
sentence I am hungry
n_samples 4
temperature 0.6
---> 0 -1.2909164428710938 0.27501862870316235 Ich bin hungrig da
---> 1 -2.09808349609375e-05 0.9999790193851352 Ich bin hungrig.
---> 2 -2.09808349609375e-05 0.9999790193851352 Ich bin hungrig.
---> 3 -2.09808349609375e-05 0.9999790193851352 Ich bin hungrig.

 3 -2.09808349609375e-05
translated_sentence Ich bin hungrig. 0.9999790193851352
+++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++
sentence Congratulations!
n_samples 4
temperature 0.6
---> 0 -3.814697265625e-06 0.9999961853100103 Herzlichen Glückwunsch!
---> 1 -3.814697265625e-06 0.9999961853100103 Herzlichen Glückwunsch!
---> 2 -3.814697265625e-05 0.9999618537549303 Ich gratuliere Ihnen!
---> 3 -3.814697265625e-06 0.9999961853100103 Herzlichen Glückwunsch!

 3 -3.814697265625e-06
translated_sentence Herzlichen Glückwunsch! 0.9999961853100103
+++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++
sentence You have completed the assignment!
n_samples 4
temperature 0.6
---> 0 -0.000232696533203125 0.9997673305385353 Sie haben die Aufgabe erfüllt!
---> 1 -4.9591064453125e-05 0.9999504101651634 Sie haben die Abmeldung abgeschlossen!
---> 2 -2.47955322265625e-05 0.9999752047751801 Sie haben die Abtretung abgeschlossen!
---> 3 -2.47955322265625e-05 0.9999752047751801 Sie haben die Abtretung abgeschlossen!

 3 -2.47955322265625e-05
translated_sentence Sie haben die Abtretung abgeschlossen! 0.9999752047751801
+++++++++++++++++++++++++++++++++++++++++++++++
 3  Tests passed
 1  Tests failed

The error seems to lie in choosing “Ich esse Suppe”, instead of (I assume) “Ich iss Suppe”.

However, when I type either sentence into Google Translate, I get the English translation “I eat soup”.

When I look at various web pages that describe the conjugation of the German verb “to eat” by Googling “German verb eat” (e.g. Essen German Conjugation | Study.com or Conjugation of essen (to eat) in German | coLanguage), I find that “esse” tends to go with “I” and “iss” (or “isst”) tends to go with “you” or “he/she/it”.

So, now I’m left wondering both how to debug(?) in order to favor one valid translation over another and if there’s more stochasticity in NMT’s other than in the logsoftmax sampling function. Or, is there some other reason that explains why my code chose ‘esse’ over ‘iss’.

If anyone (who can) wants to see my code, the Lab ID is mjhlxfqb. I realize I’m getting hung up on what is probably a minor point, but I have to admit it’s really bugging me.

Mubsi · November 28, 2022, 11:37am

Hi @Steven1,

For calculations of scores in your Ex 10, you are using weighted average overlap, but if you pay attention to the function parameters for Ex 10, you’ll notice it is already passing in a function to help calculate the scores. Use that function instead.

Cheers,
Mubsi

P.S I have removed all of the print statements from your Ex 10 as they’d cause you grading issues. If you added extra print statements elsewhere in the notebook as well, be sure to remove them before submitting for grading, otherwise you’d end up having absurd errors by the autograder.

arvyzukai · November 28, 2022, 11:51am

Hi @Steven1

Nice catch - in reality in German the translation should be “Ich esse Suppe”. I was wondering why you have such high scores (if I interpret your output correctly)? For this particular sentence, the test case should produce:

('Ich iss Suppe.',
 0, 
{0: 0.8571428571428572, 1: 0.8571428571428572, 2: 0.7619047619047619, 3: 0.8571428571428571})

Steven1 · November 28, 2022, 6:49pm

Thanks!!! Now, the world makes sense again.

Steven1 · November 29, 2022, 2:24am

@arvyzukai - nope - I was looking at log probs and probs as my outputs. When I look at the weighted and non-weighted avgs, using scores, I see this:

sentence: I eat soup.
non-weighted avgs: {0: 0.8571428571428572, 1: 0.8571428571428572, 2: 0.7619047619047619, 3: 0.8571428571428571}
weighted avgs: {0: 0.8571387701816032, 1: 0.8571387701816032, 2: 0.7619111199457392, 3: 0.857142857142857}

For the non-wted avgs, the “correct” sentence loses by 1e-16, which I believe qualifies as numerical noise (actually the values make more sense as fractions .857142… = 6/7) For such a short sentence, I should probably try hand calculating the scores…

As a side question, I’m wondering how NMT does with colloquialisms. For inatance, my grandparents were German emigres. So, I know (strongly believe) a German would express hunger as “Ich habe hunger” (i.e. “I have a hunger” , not “I am hungry”) Am I correct in believing that these are the sort of (culturally dependent?) translations that would give NMT trouble?

arvyzukai · November 29, 2022, 6:53am

I’m not an expert on NMT but as far as I know - not really. It of course depends on the dataset that the model was trained on (and on model architecture as well) but usually this particular example should be easily captured because meaning does directly follow from their parts (“I have hungryness” is not that far from “I am hungry”). They might feel not intuitive to people (English speaking) but “statistically” I think they are not that hard to grasp.

Idioms or metaphors on the other hand are more problematic to NMT (e.g., “my job is a jail” as a metaphor, or “spill the beans” as an idiom (which means reveal secret information unintentionally or indiscreetly) because here the meanings do not follow from their parts.

Topic		Replies	Views
C4_W1_Assignment: UNQ_C10 mbr_decode : Test looks dubious NLP with Attention Models week-module-1	8	454	August 29, 2023
2 out of 4 tests for C4_W1 UNQ_C10 mbr_decode not passing NLP with Attention Models week-module-1	2	521	March 13, 2023
C3W4: UNQ_C1 data generator NLP with Sequence Models week-module-4	1	368	December 5, 2023
C4W1/C4W1_Assignment "Exercise 5 - translate": I got a bunch of "eu", don't know the cause NLP with Sequence Models week-module-1 , week-module-4 , ai-discussions	3	19	August 7, 2024
C4W1_Assignment Translator unit test NLP with Attention Models week-module-1	19	508	February 9, 2025

C4W1 - UNQ_10 - How to debug choice between valid translations

Related topics