I am stuck on this same exercise. Currently I am getting a test error which is stating that the shape of the final positional encoding should be (1, positions, d_model). This shape is apparently meant to be (1, 8, 16). I copied the code I used in the function to a new cell, ran it with positions=8 and d=16, and got the correct shape. Is there something wrong with the grading function?