Weight matrices - how are they constructed

Hi

I am looking for an example of how the Query ,Key ,and Value matrices are trained. I know they are extracted from a neural network that is trained using back propagation. What I want to see is an example of how a neural network (NN) is supplied with some input data set, and what the expected output is for each record, and after the training is complete, how do I use the NN to construct each of the weight matrices. Presumably, we will need 3 networks, one for each weight matrix. Of course, in real life the data sets will be huge, but for illustration we can use very small data sets.

Thanks in advance for the help!

Transformer is a very good place to explore query, key, and value matrix which calculated scaled dot attention product, there by creating encoder decoder attention mechanism.

remember the weight matrix here is based on all the three q, k and v value based on their attention weight matrix created in encoding sequence which helps the decoder attention mechanism to allow the weight matrix to be distributed according to the input provided and get the respective output.