In the last cell of transformer lab why we need to normalize it again after the decoder block?

{moderator edit - solution code removed}

after we add decoder blocks, why we need to normalize it again which is not in the The schematic depiction?

Normalization in general, helps convergegence and faster model training. So it is a good practice to use it.