In 2.1 - Padding Mask section, I think it should be the zeros rather they zeros
Besides, as the picture below shows, q’s shape is (…, s, d), but when I optput q’s shape, the result is (s, d), there is no the first … dimention. I don’t konw whether I have expressed clearly. Or maybe it’s batch dimention afterwards it will change to (…, s, d).
But I think it’s better to clarify this in the section, cause these unclear notes would trouble us finishing codes. It troubled me anyway.
I have saw many people said this assigment is so difficult. So I think my feedback maybe useful for you to improve this assigment.
Thank you all for your hard work! Respect!

