Hi @saileshbaidya
I can also recommend to check out this great summary on multi-head attention mechanisms by NLP mentor @arvyzukai:
Best regards Christian