Error in practice quiz question 10

The correct answer for this question asserts that the BLEU score is better for the original transformer than for the reversible layer. However, the opposite is true, according to the lecture video (5:11) and the reformer paper (see Table 4).

Hi @CWKoo

Welcome to community :slight_smile:

Well… it’s quite ambiguous - the video mentions (5:27) “… It’s really because there’s been some hyperparameter tuning into three years since the original transformer paper was published. …” I personally interpret that as the Reformer having the advantage of time (2020 vs. 2017-2018).

and also in the Reformer paper the “big” model has better scores:

image

In the Lecture video Reformer is compared to 2017 version (and strangely not to 2018).

Theoretically Reformer should not outperform regular Transformers on these quality metrics (since Reformer is optimized for faster training/inference and less memory requirements while loosing minimally on quality).

But the way the Quiz question Nr. 10 is formulated is actually the opposite of this (or at least ambiguous) - for me, it suggests that the Reformer is the older architecture by 3 years than the regular Transformer (which is obviously not true) and this is the reason why it has better scores… :slight_smile:

I will submit for a better formulation of the question.

Thanks for bringing this up.
Cheers

1 Like