Attention Weights Issue in C5 W3 Neural Machine Translation Assignment (Section 3.1)

In the optional part of the C5 W3 Neural Machine Translation assignment, specifically in section 3.1 - Getting the Attention Weights From the Network, I’m encountering an issue where the outputted attention values are completely incorrect. The condition \sum_{t'} \alpha^{\langle t, t' \rangle} = 1 is not satisfied, as shown in the attached image.

image

Despite this, I still received a 100% score from the grader. I’ve reviewed my code thoroughly and everything seems correct.

Has anyone else experienced a similar problem? Any insights or suggestions would be greatly appreciated.

Thanks!

1 Like

Hi @toni_1 !

Is it possible that you have output the attention weights from different queries? Maybe you can try to print the attention weight values from several different queries and check again.

Also, one need to understand here alpha value is probability output of the softmax and not a confirmation to equation attention weight values to get correct output. Hope you get this part.

@XinghaoZong I’ve tried other queries, but I’m encountering the same issue.

Looking at Figure 8: Full Attention Map in the notebook, it’s clear that each row should sum to 1. It seems like the softmax function applied to e^{\langle t, t' \rangle} isn’t working as expected.

I’m not entirely sure what @Deepti_Prasad means by “a confirmation to equation attention weight values to get correct output.”

If anyone has additional insights or suggestions, I would appreciate the help. Thanks for your responses!