In the course Transformer Network Intuition , The Professor Ng said that " The major innovation of the transformer architecture is combining the use of attention based representations and a CNN convolutional neural network style of processing.".
Which part of the Transformer apply the CNN convolutional neural network?
Anyone knows this?
I guess is thw matrix multiplications, which are similar to convulution operations in nature…
1 Like
No part of Transformer applies CNN. As mentioned, it is CNN style of processing.
As @gent.spah mentioned, it might be referring to the matrix multiplication. Moreover, in the same lecture video, Andrew explains what he means by that.
2 Likes