How to calculate total parameters in Attention is all you need paper

Can someone help me to find total number of parameters in transformer base model, big model in Attention is all you need paper.

Also, please describe each parameter when used in calculations.

Number of parameters in each multi-head attention layer:

π‘π‘Žπ‘‘π‘‘=𝑁(π‘Šπ‘‚)+(𝑁(π‘Šπ‘„π‘–)+𝑁(π‘ŠπΎπ‘–)+𝑁(π‘Šπ‘‰π‘–))Γ—β„Ž

I could come up with this only?
Where:

  • ( N(W_O) ) is the number of parameters in the output weight matrix ( W_O ).
  • ( N(W_{Qi}) ) is the number of parameters in the query weight matrix ( W_{Qi} ).
  • ( N(W_{Ki}) ) is the number of parameters in the key weight matrix ( W_{Ki} ).
  • ( N(W_{Vi}) ) is the number of parameters in the value weight matrix ( W_{Vi} ).
  • ( h ) is the number of attention heads.