Weight matrices - how are they constructed

sst · December 15, 2024, 4:26pm

Hi

I am looking for an example of how the Query ,Key ,and Value matrices are trained. I know they are extracted from a neural network that is trained using back propagation. What I want to see is an example of how a neural network (NN) is supplied with some input data set, and what the expected output is for each record, and after the training is complete, how do I use the NN to construct each of the weight matrices. Presumably, we will need 3 networks, one for each weight matrix. Of course, in real life the data sets will be huge, but for illustration we can use very small data sets.

Thanks in advance for the help!

Deepti_Prasad · December 16, 2024, 9:22am

Transformer is a very good place to explore query, key, and value matrix which calculated scaled dot attention product, there by creating encoder decoder attention mechanism.

remember the weight matrix here is based on all the three q, k and v value based on their attention weight matrix created in encoding sequence which helps the decoder attention mechanism to allow the weight matrix to be distributed according to the input provided and get the respective output.

Topic		Replies	Views
The Matrix Math for self-attention Attention in Transformers: Concepts and Code in Py	4	72	February 22, 2025
Week 3, Video 3: Understanding matrix size Neural Networks and Deep Learning coursera-platform	7	658	January 29, 2025
[Week 1] Confused about the code example Advanced Learning Algorithms week-module-1	2	23	May 3, 2025
Problems Interpreting the Query, Key and Value matrices: NLP with Attention Models week-module-1	2	1006	December 13, 2022
Cant understand a matrix Neural Networks and Deep Learning coursera-platform	5	1255	March 8, 2024

Weight matrices - how are they constructed

Related topics