Yet another tangential question

Attention Models Video

Two very poorly formed questions.

  1. At a gut level, this absolutely reminds me of a Feynman diagram and propogrators/number operators/second quantization. Is there anything to that statement?
  2. So I think a lot of physicists would questions whether or not locality/causality are as real as we once thought they are. But there’s a pretty blatant paradox here to wonder about, as far as that’s concerned. Consider the unit of the feed forward and backward recurrent neural network that forms part of the language model in the lower left hand corner. That diamond shaped unit, with two squares, propogates it’s activation \vec{a}^{<1>} forward along the same RNN and also receives the reversed ^\leftarrow{a}^{<1>} travelling backward along the RNN. There would be no issues with simulteneity, except that the information from those activations are then passed along, at least hopefully simultaneously, to the c-inputs of the upper feed-forward recurrent neural network, which posesses the fixed Foreign language inputs y^* that are used to train the English translation inputs x^{<i>} in the lower multi-directional recurrant neural network.

I guess my question is, how can that signal be sent simultaneously from (\vec{a}^{<1>},^\leftarrow a^{<1>}) to c^{<1>} given that the two a^{<1>}'s travel significantly different path lengths and durations of compuation, potentially

Does this involve just a huge ton of sychronous/asychronous calls in parallelization that I never quite mastered? Or is this literally a question of simultenity and locality?

Have you seen this ?