If I understand you correctly, then this number would be 32. In other words,

- if the embedding dimension is 32 and we have 1 head, then the head dimension is 32;
- if the embedding dimension is 32 and we have 4 heads, then the head dimensions is 8;

To illustrate what happens with more concrete steps (this time with embedding dimension 16):

For example, input is “word1 word2” → tokenize(input) → [2, 54] → Embed([2, 54]) → 2D matrix of shape (2, 16):

```
[[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],
[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]]
```

*Note: the values here are just *indexes*, as the real values would be more normal (like 0.23, -0.10 etc.)

→ get_Q()=Embed([2, 54]) \cdot W_q → Q :

```
[[0,1,2,3,|4,5,6,7,|8,9,10,11,|12,13,14,15],
[0,1,2,3,|4,5,6,7,|8,9,10,11,|12,13,14,15]]
```

→ get_K()=Embed([2, 54]) \cdot W_k → K:

```
[[0,1,2,3,|4,5,6,7,|8,9,10,11,|12,13,14,15],
[0,1,2,3,|4,5,6,7,|8,9,10,11,|12,13,14,15]]
```

→ get_V()=Embed([2, 54]) \cdot W_v → V:

```
[[0,1,2,3,|4,5,6,7,|8,9,10,11,|12,13,14,15],
[0,1,2,3,|4,5,6,7,|8,9,10,11,|12,13,14,15]]
```

*Note: “|” is just for illustration purpose, in reality it would be a simple matrix as before *but* later split into a 3D tensor (for each head according to these “|”) (in this week’s assignment, these splits (crazy reshapes and transposes) happen in `compute_attention_heads_closure`

. So, “|” indicates each head’s “working” dimensions or splits (4 numbers here, 8 in the previous post, 3 in your pictures).

Now when you get attention weights (by softmax(Q \cdot K^T) you can represent them like a 3D tensor:

```
[
[[0,1], # Q_0_0 @ K_1_0^T, Q_0_0 @ K_0_0^T
[0,1]], # Q_1_0 @ K_1_0^T, Q_1_0 @ K_0_0^T
[[0,1],
[0,1]],
[[0,1],
[0,1]],
[[0,1],
[0,1]],
]
```

When you apply this attention on V (attention \cdot V) the output is Z_1, Z_2, Z_3, Z_4 which after concatenation (again “crazy” reshapes and transposes in `compute_attention_output_closure`

) becomes a 2D matrix of shape (2, 16) (like in your bottom picture). Also note that order of these attention dimensions do not match Assignment but are just for illustration.