I just finished Course 5 of the specialization (Sequence Models). This course was not as well-structured as the previous ones and felt kinda hasty. Some details (especially about transformer models) were handwaved away.
I have a question about positional encoding. I found it weird and unintuitive that positional encoding vectors are added (sum) to the input vector (X), because summation always causes a loss of information. For example, if you have a + b = 5, you can’t later infer if a and b were 2 and 3, or 2.5 and 2.5, etc.
Why isn’t positional encoding passed as a separate vector (concatenated to the input, as opposed to being added to it)?