Hi
I am looking for an example of how the Query ,Key ,and Value matrices are trained. I know they are extracted from a neural network that is trained using back propagation. What I want to see is an example of how a neural network (NN) is supplied with some input data set, and what the expected output is for each record, and after the training is complete, how do I use the NN to construct each of the weight matrices. Presumably, we will need 3 networks, one for each weight matrix. Of course, in real life the data sets will be huge, but for illustration we can use very small data sets.
Thanks in advance for the help!