The quiz implies that parallelism is the reason transformers need positional encodings.
This is misleading and not technically accurate.
Transformers need positional information because self-attention is permutation-invariant, not because computation is parallel.
-
Self-attention computes interactions among tokens without regard to order unless you explicitly inject order.
-
Even if the attention mechanism were computed serially, it would still be permutation invariant (you would still need to inject positional information).
-
RNNs preserve order not because they are non-parallel but because their state transition function encodes order by design.
So the real reason is architectural, not procedural.
Even though the choice is the one closest to the truth, my suggestion would be to use better phrasing, like:
“Transformers require positional encodings because the attention mechanism is inherently permutation-invariant; without them, swapping token order yields identical representations.”
If I remember correctly, some course video also ambiguously alluded that the “parallelism” was to blame (and it did not mention the permutation invariance) but that felt “ok”. On the other hand, the quiz answer and the explanation might mislead or confuse someone that the whole problem is around “parallel processing”.
Cheers
P.S. I really like the course
one of the best ![]()


