Why is Positional Encoding added to the input, instead of being concatenated to it?

I just finished Course 5 of the specialization (Sequence Models). This course was not as well-structured as the previous ones and felt kinda hasty. Some details (especially about transformer models) were handwaved away.

I have a question about positional encoding. I found it weird and unintuitive that positional encoding vectors are added (sum) to the input vector (X), because summation always causes a loss of information. For example, if you have a + b = 5, you can’t later infer if a and b were 2 and 3, or 2.5 and 2.5, etc.

Why isn’t positional encoding passed as a separate vector (concatenated to the input, as opposed to being added to it)?

There is no loss of information in your example, because we do not discard the original features. We are adding a new feature that provides a constraint. It adds information.