Masking np. where, np.full_like

Fei_Li · April 26, 2023, 2:03am

Hello, I am having trouble to understand this line of code for masking
np.where(m, dots,np.full_like(dots, -1e9)). I would appreciate to get some help.

I played with the example below from ungraded lab 1:
q = create_tensor([[1, 0, 0], [0, 1, 0]])
display_tensor(q, ‘query’)
k = create_tensor([[1, 2, 3], [4, 5, 6]])
display_tensor(k, ‘key’)
m = create_tensor([[0, 0], [-1e9, 0]])
display_tensor(m, ‘mask’)

if m is not None:

dots = np.where(m, dots,np.full_like(dots, -1e9))

so this is my m:
[[ 0.e+00 0.e+00]
[-1.e+09 0.e+00]]

this is my dots before mask:
array([[0.57735027, 2.30940108],
[1.15470054, 2.88675135]])

this is my dots after mask:
array([[-1.00000000e+09, -1.00000000e+09],
[ 1.15470054e+00, -1.00000000e+09]])

So I am guessing this line of code np.where(m, dots,np.full_like(dots, -1e9))
is saying, where m is 0, we replace with a large negative number -1e9, where m is not 0 (that’s below the diagonal) we keep it. Am I right？

but I thought we were to mask with a matrix like this

so shouldn’t the mask m be like
[[0,inf]
[0,0]] instead of the one:
[[ 0.e+00 0.e+00]
[-1.e+09 0.e+00]] from the lab?
if I follow the picture, my line of code should be like:
dots=dots+
[[0,inf]
[0,0]]

arvyzukai · April 26, 2023, 4:57am

Hi @Fei_Li

Good job for spotting this mistake

Yes the m variable created in the lab is wrong (at least it does not reflect the picture you mentioned and the usual overall use; in theory you could make it work if you transpose it or transpose the dots, but it’s not what usually happens).

I will submit it for fixing so that future learners would not get confused by it, thanks to you

Fei_Li · April 26, 2023, 12:16pm

Sure. Thank you. I was trying to learn though.

So you mean applying a transpose on a mask like this?
m = create_tensor([[0, 0], [-1e9, 0]]).T
then my mask would be

then I use this line of code
if m is not None:

dots = np.where(m, dots,np.full_like(dots, -1e9))

to apply on my dots, which is

I get

I thought the above diagonal numbers should be 0, but now they are below. also, the numbers on diagonals should not be zeros.

I still don’t understand np. full_like and np. where (m, dots, …), would you help me with that as well?

Appreciate your help.

arvyzukai · April 26, 2023, 1:01pm

Yes, that is the way the mask supposed to look according to the pictures (when we use mask as an addition to the dots).

So the continuation should be:

and when we apply mask as in the picture, we get :

Below is my additional input for your learning (This is not in the Lab):

To get the attention scores we would apply softmax:

These would be the attention weights. How to interpret them:

for the first token (word) we would assign all the attention to the first token and ignore the second; (or concretely: 1.0 * first_token_embeddings + 0.0 * second_token_embeddings)
for the second token we would assign 85% of attention to the second and 15% attention to the first; (or concretely: 0.15 * first_token_embeddings + 0.85 * second_token_embeddings)

Unfortunately (or fortunately) this Lab also has another implementation of DotProductAttention with different approach which might confuse you. In the second approach, the mask is generated with 1 s and consequently Boolean. The second implementation (not the one in the pictures but similar nonetheless):

Fei_Li · April 27, 2023, 12:56am

Ah, sure sure. Now it’s so much clear. I compared your code with mine. I realized I was using “dots+mask” and then applied “np.where(mask, dots… )” again…

Also thanks for the additional explanation for weights. Now I understand better the “attention”

someone555777 · June 26, 2023, 12:57pm

I didn’t understand this string

np.swapaxes(key, -1, -2)

why should we take for transpose only last 2 axis?

and

logsumexp = scipy.special.logsumexp(dots, axis=-1, keepdims=True)

How should I tumble to axis=-1, keepdims=True from description?

Topic		Replies	Views
UNQ_C3 Mask Implementation NLP with Attention Models week-module-1	1	495	March 11, 2023
C5W4A1 Exercise 3 - scaled_dot_product_attention Sequence Models coursera-platform	5	1209	July 12, 2021
Scales dot product attention Sequence Models coursera-platform	2	955	June 18, 2021
# UNQ_C3 help with mask NLP with Attention Models week-module-1	1	570	April 6, 2022
Course 5 Week 4 Transformer Look-ahead Mask Sequence Models coursera-platform	7	966	June 23, 2021

Masking np. where, np.full_like

Related topics