AttentionQKV dense layers

mazatov · November 15, 2021, 12:36pm

So the trax implementation of AttentionQKV seems a bit different from the way it was explained in the videos. Specifically the parallel Dense layers and the ouput dense layers. Why do we need those Dense layers?

Why couldn’t we do directly QKV formula the way it was explained? Is there somewhere I can read more about it?

Simen · April 23, 2022, 9:40am

I’m curious about this as well. It seems to contradict what they say in the lecture video where they explicitly say scaled dot-product attention is without any neural networks, so it would be nice to have the role of the dense layers in the Trax implementation cleared up.

Topic		Replies	Views
Questions regarding course 4 week 1 NLP with Attention Models week-1	1	577	August 3, 2022
tl.AttentionQKV layer is by default nested inside the Serial Combinator NLP with Attention Models week-1	1	501	January 3, 2023
Having trouble understanding the Attention Layer NLP with Attention Models week-1	6	566	December 6, 2022
C4W2_Assignment in Natural Language Processing with Attention NLP with Attention Models week-3	2	62	September 2, 2024
Week4, assignment, scaled_dot_product_attention() Sequence Models	2	416	September 25, 2023

AttentionQKV dense layers

Related topics