CNN has the capability to understand “new data” (Ex. New face) and then create encoding for that based on training data.
But in case of RNN/NLP, it seems the concept of having fixed weights for each time step is becoming a constraint for it to identify “new words” and create encoding for that.
Ideally should 2-D image and 1-D sequence data (like voice/text) be treated in the same way and can CNN also be applied for sequence data? What do you think are primary constraints/issues in why CNN cannot be applied to sequence data?