Discussion on CNN Vs RNN

CNN has the capability to understand “new data” (Ex. New face) and then create encoding for that based on training data.
But in case of RNN/NLP, it seems the concept of having fixed weights for each time step is becoming a constraint for it to identify “new words” and create encoding for that.

Ideally should 2-D image and 1-D sequence data (like voice/text) be treated in the same way and can CNN also be applied for sequence data? What do you think are primary constraints/issues in why CNN cannot be applied to sequence data?

Conv1D can be applied on text embeddings or directly on sequence data.

Please look at the section Used in the tutorials in the same link above to find examples.