C4W1: UNQQ-C6: unittest.test_next_symbol failed

George_Hsieh · October 30, 2023, 7:14am

Got AssertionError:
Expected output: [7283, -9.92…]
Your output: [7283, -0.61…]
Got the symbol correctly, but the prob is wrong.
Passed all unit tests before this point.
Thank you for suggestions.

arvyzukai · October 30, 2023, 11:02am

Hi @George_Hsieh

Make sure you do not use global model variable instead of NMTAttn variable. Also check your padding implementation. Here you could find hints what went wrong:

Cheers

George_Hsieh · October 30, 2023, 1:17pm

Thank you. I used NMTAttn variable instead of the global model variable. Otherwise, my next symbol would be 140. The padding also appeared OK. My question is how I could get the correct next symbol, but with a very different log probability. The expected log prob is -9.9290 (or 0.03%), and my computed log prob is -0.6154 (or 54.42%). That is a huge difference. Also, when I displayed the output variable, all the float numbers are in -0.6 or -0.7 range.

Could it be my NMTAttn model is incorrect, although it passed the unittest.test_NMTAttn.

arvyzukai · October 30, 2023, 1:35pm

Most probably the problem lies not in the model (if you passed previous tests and got the expected outputs) but the way you handle the outputs. For example, you might unnecessary applied softmax or other operations (your predicted token would still be correct, but the log probs wouldn’t). Does your outputs match the number 6 point in my previously linked post?

George_Hsieh · October 30, 2023, 1:54pm

Thank you. My output does not match your output in #6. My debugging display:

Current output tokens:
Token length: 0
The padded length is: 1
The padded is: [0]
output: [[[0. 0. 0. … 0. 0. 0.]]]
The log_probs is: [0. 0. 0. … 0. 0. 0.]
The symbol is: 0
The log_prob of symbol is: 0.0
Current output tokens: [18477]
Token length: 1
The padded length is: 2
The padded is: [18477, 0]
output: [[[-0.710495 -0.7022415 -0.7007726 … -0.72928303 -0.6969676
-0.69470644]
[-0.6760952 -0.6841349 -0.6855796 … -0.6582717 -0.6893413
-0.6915904 ]]]
The log_probs is: [-0.6760952 -0.6841349 -0.6855796 … -0.6582717 -0.6893413 -0.6915904]
The symbol is: 7283
The log_prob of symbol is: -0.6145443916320801

arvyzukai · October 30, 2023, 2:17pm

This indicates that your model outputs are off, for the test case they should be (approximately):

for the first step:
[0]
[[[-10.323489 -10.371064 -10.382686 … -10.290064 -10.399619 -10.410941]]]
for the second step:
[18477, 0]
[[[-10.323489 -10.371064 -10.382686 … -10.290064 -10.399619
-10.410941 ]
[-10.293335 -10.3572035 -10.371738 … -10.223298 -10.396238
-10.412071 ]]]
in other words, to get 7283, you would use:
[-10.293335 -10.3572035 -10.371738 … -10.223298 -10.396238
-10.412071 ]
and not (even though it would result in the same symbol):
[-0.6760952 -0.6841349 -0.6855796 … -0.6582717 -0.6893413
-0.6915904 ]

So the question then would be do you really passed all the previous tests and your model is:
Expected Output:

Serial_in2_out2[
  Select[0,1,0,1]_in2_out4
  Parallel_in2_out2[
    Serial[
      Embedding_33300_1024
      LSTM_1024
      LSTM_1024
    ]
    Serial[
      Serial[
        ShiftRight(1)
      ]
      Embedding_33300_1024
      LSTM_1024
    ]
  ]
  PrepareAttentionInput_in3_out4
  Serial_in4_out2[
    Branch_in4_out3[
      None
      Serial_in4_out2[
        _in4_out4
        Serial_in4_out2[
          Parallel_in3_out3[
            Dense_1024
            Dense_1024
            Dense_1024
          ]
          PureAttention_in4_out2
          Dense_1024
        ]
        _in2_out2
      ]
    ]
    Add_in2
  ]
  Select[0,2]_in3_out2
  LSTM_1024
  LSTM_1024
  Dense_33300
  LogSoftmax
]

And also, have you initialized weights from a pre-trained model in Section 4 - Testing?

Refreshing the environment and running all the steps might help (in case you saved the wrong model).

George_Hsieh · October 30, 2023, 2:48pm

Thank you. My model passed the unit test and it is displayed as follows:

Serial_in2_out2[
Select[0,1,0,1]_in2_out4
Parallel_in2_out2[
Serial[
Embedding_33300_1024
LSTM_1024
LSTM_1024
]
Serial[
Serial[
ShiftRight(1)
]
Embedding_33300_1024
LSTM_1024
]
]
PrepareAttentionInput_in3_out4
Serial_in4_out2[
Branch_in4_out3[
None
Serial_in4_out2[
_in4_out4
Serial_in4_out2[
Parallel_in3_out3[
Dense_1024
Dense_1024
Dense_1024
]
PureAttention_in4_out2
Dense_1024
]
_in2_out2
]
]
Add_in2
]
Select[0,2]_in3_out2
LSTM_1024
LSTM_1024
Dense_33300
LogSoftmax
]

I believe it matches the expected output. I usually restart the kernel and re-run the whole notebook. So that should initialize the weights from a pre-trained model in Section 4.

BTW, I suspect the log probs returned by my model might not be normalized, as it appears that the sum of probabilities for all vocabulary tokens can add up to more than 1.

arvyzukai · October 30, 2023, 6:14pm

If someone else in the future encounters the same values (Your output: [7283, -0.61…]), you probably specified the wrong axis for LogSoftmax layer.

You do not need to specify axis for the LogSoftmax layer in this Assignment (the default value (and correct) is -1, the last dimension of the tensor).

George_Hsieh · October 30, 2023, 6:31pm

Thanks for the explanation!!!

Topic		Replies	Views
Assistant needed to solve a cryptic crror when implementing C6 NLP with Attention Models week-1	5	536	February 12, 2023
C4_W1_Q6 Output close to expected output but wrong NLP with Attention Models week-1	5	515	April 3, 2023
C4_W1_UNQ_C6 wrong ouput NLP with Attention Models week-1	3	522	March 26, 2023
C4_W1_C6 next_symbol returns AssertionError NLP with Attention Models week-1	3	583	February 8, 2023
C4_W1: Problem with UNQ_C7 NLP with Attention Models week-1	3	416	July 27, 2023

C4W1: UNQQ-C6: unittest.test_next_symbol failed

Related topics