What if special tags are part of the sentence?

If a sentence consisting of <SOS>and <EOS> is passed as input for a transformer or an RNN, how does it recognize that these are actually a part of the sentence and not special tags to mark start and end of the sentence??

<SOS> and <EOS> must be unique tags that are not used in any other way. You must ensure this when the data set is created.

So, if we have text that has these tags, what do we do? Should we just remove that part of the text and use the rest or is there a better solution for this?

Typically you’d convert those tags into another sequence of two tokens. In communications, it’s called an “escape sequence”.