In the notebook image the structure is depicted as Attention layer then add residual and then layer norm. However, in the coding part it is structured as :
# addResidual layer tl.Residual( # add norm layer tl.LayerNorm(), # add attention attention, # add dropout dropout_, ),
Why is that?
