Why does embedding need to be rescaled by multiplying square root of the embedding dimension?

I think I found the answer, please see my post: [Week 4]Exercise 5 - Encoder. Why need to scale the embedding by sqrt(d)? - #3 by Shengwu