Hello, I’ve spent hours on debugging going through many of the postings here but still can’t figure out how I solve for the errors with exercise 5 and 7. Could anyone help me out on this? Funny (or maybe not so funny) thing is that I got all the other tests (i.e. exercises) correct.
Here are the interim outputs I’ve produced in my debugging effort suggested by @saifkhanengr in one of the earlier posts.
Exercise 5 (Encoder)
x after word embeddings:[[[ 0.00860578 0.00740315 -0.0409526 0.00755553]
[ 0.01270969 0.04936013 -0.04764051 -0.04633161]
[-0.02472718 -0.03895496 0.01122528 -0.03709315]]
[[ 0.01270969 0.04936013 -0.04764051 -0.04633161]
[ 0.00860578 0.00740315 -0.0409526 0.00755553]
[ 0.0144151 0.03082472 0.03976548 0.01368902]]]
x after scale embeddings:[[[ 0.01721156 0.01480629 -0.0819052 0.01511107]
[ 0.02541938 0.09872026 -0.09528103 -0.09266322]
[-0.04945436 -0.07790992 0.02245057 -0.0741863 ]]
[[ 0.02541938 0.09872026 -0.09528103 -0.09266322]
[ 0.01721156 0.01480629 -0.0819052 0.01511107]
[ 0.02883019 0.06164945 0.07953096 0.02737803]]]
x after positional encodings:[[[ 0.01721156 1.0148063 -0.0819052 1.0151111 ]
[ 0.8668903 0.6390225 -0.08528119 0.90728676]
[ 0.8598431 -0.49405676 0.04244923 0.9256137 ]]
[[ 0.02541938 1.0987203 -0.09528103 0.9073368 ]
[ 0.8586825 0.55510855 -0.07190537 1.015061 ]
[ 0.9381276 -0.3544974 0.09952962 1.027178 ]]]
x after dropout:[[[ 0.01912395 1.1275625 -0.09100578 1.1279013 ]
[ 0.9632115 0.7100251 -0.09475689 1.0080965 ]
[ 0.9553812 -0.548952 0.04716581 1.0284597 ]]
[[ 0.02824375 1.2208004 -0.10586781 1.008152 ]
[ 0.9540917 0.6167873 -0.07989486 1.1278456 ]
[ 1.042364 -0.39388603 0.11058848 1.141309 ]]]
final x:[[[-0.8311426 1.1151567 -1.1483109 0.86429673]
[ 0.6521363 -0.16201025 -1.5543296 1.0642036 ]
[ 1.0701332 -1.2811097 -0.6663007 0.8772773 ]]
[[-0.8419315 1.1047187 -1.1404076 0.87762046]
[ 0.6385438 -0.14090931 -1.5619406 1.064306 ]
[ 1.0678458 -1.285623 -0.6602554 0.87803257]]]
Exercise 7 (Decoder)
x after word embeddings:[[[-0.02472718 -0.03895496 0.01122528 -0.03709315]
[ 0.00860578 0.00740315 -0.0409526 0.00755553]
[ 0.01270969 0.04936013 -0.04764051 -0.04633161]]
[[ 0.00860578 0.00740315 -0.0409526 0.00755553]
[ 0.01270969 0.04936013 -0.04764051 -0.04633161]
[ 0.0144151 0.03082472 0.03976548 0.01368902]]]
x after scale embeddings:[[[-0.04945436 -0.07790992 0.02245057 -0.0741863 ]
[ 0.01721156 0.01480629 -0.0819052 0.01511107]
[ 0.02541938 0.09872026 -0.09528103 -0.09266322]]
[[ 0.01721156 0.01480629 -0.0819052 0.01511107]
[ 0.02541938 0.09872026 -0.09528103 -0.09266322]
[ 0.02883019 0.06164945 0.07953096 0.02737803]]]
x after positional encodings:[[[-0.04945436 0.92209005 0.02245057 0.9258137 ]
[ 0.8586825 0.55510855 -0.07190537 1.015061 ]
[ 0.93471676 -0.3174266 -0.07528237 0.9071368 ]]
[[ 0.01721156 1.0148063 -0.0819052 1.0151111 ]
[ 0.8668903 0.6390225 -0.08528119 0.90728676]
[ 0.9381276 -0.3544974 0.09952962 1.027178 ]]]
x after dropout:[[[-0.04945436 0.92209005 0.02245057 0.9258137 ]
[ 0.8586825 0.55510855 -0.07190537 1.015061 ]
[ 0.93471676 -0.3174266 -0.07528237 0.9071368 ]]
[[ 0.01721156 1.0148063 -0.0819052 1.0151111 ]
[ 0.8668903 0.6390225 -0.08528119 0.90728676]
[ 0.9381276 -0.3544974 0.09952962 1.027178 ]]]
final x:[[[-0.9803824 1.6441399 -0.5720506 -0.09170692]
[-0.71146524 1.7123504 -0.6747776 -0.3261074 ]
[-0.6284193 1.7208974 -0.6992812 -0.39319703]]
[[-0.98235804 1.6432738 -0.571514 -0.08940189]
[-0.7147792 1.7112614 -0.6773841 -0.319098 ]
[-0.62005705 1.7221041 -0.6970215 -0.4050255 ]]]
I’ve resolved these issues - there was a problem with using “sum” in encoding/decoding layers codes, which didn’t create any issue with those tests but unexpectedly leading to these wrong values. I replaced it and all resolved.
Would appreciate if anyone could explain why “sum” didn’t work.
Where exactly were you using “sum()”?