Week 3 Assignment - Why couldn't we have used tl.Attention() in Week 2's assignment?

In Week 2’s assignment, we had to write an attention function from scratch. But in this week’s assignment, it’s revealed that Trax has an attention function.

Why did we have to write the function for Week 2? Was it to help us understand multi-headed attention on a technical level? Or did the usage of masking make tl.Attention() unusable?

Hi @Harvey_Wang

It was to help us understand multi-head attention and because of that I think it was one of the best Assignments in the course :slight_smile:

Cheers

2 Likes