Got AssertionError:
Expected output: [7283, -9.92…]
Your output: [7283, -0.61…]
Got the symbol correctly, but the prob is wrong.
Passed all unit tests before this point.
Thank you for suggestions.
Make sure you do not use global model
variable instead of NMTAttn
variable. Also check your padding implementation. Here you could find hints what went wrong:
Cheers
Thank you. I used NMTAttn variable instead of the global model variable. Otherwise, my next symbol would be 140. The padding also appeared OK. My question is how I could get the correct next symbol, but with a very different log probability. The expected log prob is -9.9290 (or 0.03%), and my computed log prob is -0.6154 (or 54.42%). That is a huge difference. Also, when I displayed the output variable, all the float numbers are in -0.6 or -0.7 range.
Could it be my NMTAttn model is incorrect, although it passed the unittest.test_NMTAttn.
Most probably the problem lies not in the model (if you passed previous tests and got the expected outputs) but the way you handle the outputs. For example, you might unnecessary applied softmax or other operations (your predicted token would still be correct, but the log probs wouldn’t). Does your outputs match the number 6 point in my previously linked post?
Thank you. My output does not match your output in #6. My debugging display:
Current output tokens:
Token length: 0
The padded length is: 1
The padded is: [0]
output: [[[0. 0. 0. … 0. 0. 0.]]]
The log_probs is: [0. 0. 0. … 0. 0. 0.]
The symbol is: 0
The log_prob of symbol is: 0.0
Current output tokens: [18477]
Token length: 1
The padded length is: 2
The padded is: [18477, 0]
output: [[[-0.710495 -0.7022415 -0.7007726 … -0.72928303 -0.6969676
-0.69470644]
[-0.6760952 -0.6841349 -0.6855796 … -0.6582717 -0.6893413
-0.6915904 ]]]
The log_probs is: [-0.6760952 -0.6841349 -0.6855796 … -0.6582717 -0.6893413 -0.6915904]
The symbol is: 7283
The log_prob of symbol is: -0.6145443916320801
This indicates that your model outputs are off, for the test case they should be (approximately):
- for the first step:
[0]
[[[-10.323489 -10.371064 -10.382686 … -10.290064 -10.399619 -10.410941]]] - for the second step:
[18477, 0]
[[[-10.323489 -10.371064 -10.382686 … -10.290064 -10.399619
-10.410941 ]
[-10.293335 -10.3572035 -10.371738 … -10.223298 -10.396238
-10.412071 ]]]
in other words, to get 7283, you would use:
[-10.293335 -10.3572035 -10.371738 … -10.223298 -10.396238
-10.412071 ]
and not (even though it would result in the same symbol):
[-0.6760952 -0.6841349 -0.6855796 … -0.6582717 -0.6893413
-0.6915904 ]
So the question then would be do you really passed all the previous tests and your model is:
Expected Output:
Serial_in2_out2[
Select[0,1,0,1]_in2_out4
Parallel_in2_out2[
Serial[
Embedding_33300_1024
LSTM_1024
LSTM_1024
]
Serial[
Serial[
ShiftRight(1)
]
Embedding_33300_1024
LSTM_1024
]
]
PrepareAttentionInput_in3_out4
Serial_in4_out2[
Branch_in4_out3[
None
Serial_in4_out2[
_in4_out4
Serial_in4_out2[
Parallel_in3_out3[
Dense_1024
Dense_1024
Dense_1024
]
PureAttention_in4_out2
Dense_1024
]
_in2_out2
]
]
Add_in2
]
Select[0,2]_in3_out2
LSTM_1024
LSTM_1024
Dense_33300
LogSoftmax
]
And also, have you initialized weights from a pre-trained model in Section 4 - Testing?
Refreshing the environment and running all the steps might help (in case you saved the wrong model).
Thank you. My model passed the unit test and it is displayed as follows:
Serial_in2_out2[
Select[0,1,0,1]_in2_out4
Parallel_in2_out2[
Serial[
Embedding_33300_1024
LSTM_1024
LSTM_1024
]
Serial[
Serial[
ShiftRight(1)
]
Embedding_33300_1024
LSTM_1024
]
]
PrepareAttentionInput_in3_out4
Serial_in4_out2[
Branch_in4_out3[
None
Serial_in4_out2[
_in4_out4
Serial_in4_out2[
Parallel_in3_out3[
Dense_1024
Dense_1024
Dense_1024
]
PureAttention_in4_out2
Dense_1024
]
_in2_out2
]
]
Add_in2
]
Select[0,2]_in3_out2
LSTM_1024
LSTM_1024
Dense_33300
LogSoftmax
]
I believe it matches the expected output. I usually restart the kernel and re-run the whole notebook. So that should initialize the weights from a pre-trained model in Section 4.
BTW, I suspect the log probs returned by my model might not be normalized, as it appears that the sum of probabilities for all vocabulary tokens can add up to more than 1.
If someone else in the future encounters the same values (Your output: [7283, -0.61…]), you probably specified the wrong axis for LogSoftmax layer.
You do not need to specify axis for the LogSoftmax layer in this Assignment (the default value (and correct) is -1, the last dimension of the tensor).
Thanks for the explanation!!!