Why saying that the Transformer architecture combine the CNN network style

In the course Transformer Network Intuition , The Professor Ng said that " The major innovation of the transformer architecture is combining the use of attention based representations and a CNN convolutional neural network style of processing.".
Which part of the Transformer apply the CNN convolutional neural network?
Anyone knows this?

I guess is thw matrix multiplications, which are similar to convulution operations in nature…

1 Like

No part of Transformer applies CNN. As mentioned, it is CNN style of processing.

As @gent.spah mentioned, it might be referring to the matrix multiplication. Moreover, in the same lecture video, Andrew explains what he means by that.