About encoder structure in C4_W3_Assignment

In the notebook image the structure is depicted as Attention layer then add residual and then layer norm. However, in the coding part it is structured as :
# addResidual layer tl.Residual( # add norm layer tl.LayerNorm(), # add attention attention, # add dropout dropout_, ),

Why is that?

Hi @Richard_Tsai

That is a good question :+1: The short answer is that the picture is from the original Attention Is All You Need paper while the Assignment implementation is a newer (better) version of it.

The image is taken from here and you can read more about it if you’re interested.


1 Like