Week 4 A1 problem with scaled_dot_product_attention

(3,4) is expected at the shape where codes are asserting it incorrectly.

So you need to check when the weights asserted to the q.shape[0] and k.shape[1], the shapes are not matching here, perhaps check codes on how you scaled the matmul_qk or where you add mask to the scaled tensor (missed an extra tuple??)