[Week 2] Assignment 1: Operations on Word Vectors - Debiasing - Exercise 5.2

alexiskyaw · June 4, 2021, 4:06pm

Same here.

I’m fairly confident that my formulas are ok. Actually also sounds reasonable that the result should be symmetric.

I agree with the argument:

cosine_similarity(e1, gender) = -0.2423973757231344
cosine_similarity(e2, gender) = 0.24239737572313433 ,

which is close to the original values

cosine_similarity(word_to_vec_map[“man”], gender) = -0.11711095765336832
cosine_similarity(word_to_vec_map[“woman”], gender) = 0.35666618846270376

that the similarity values should be closer to the original similarity values.

gugger · June 5, 2021, 4:29pm

Same result as for all the others… Any resolution on this, @Kic?

Kic · June 5, 2021, 5:18pm

Hi @gugger,

I still have not heard any further news on this one. Will be in touch once I got news.

SaurabhSawhney · June 6, 2021, 2:25pm

Got the same result. Wonder what’s amiss.

glenn · June 9, 2021, 12:10am

Same result here as well

mpsica · June 13, 2021, 10:12pm

Same result:

cosine similarities after equalizing:
cosine_similarity(e1, gender) =  -0.23142942644908535
cosine_similarity(e2, gender) =  0.23142942644908535

geomacy · June 16, 2021, 9:43pm

Same here.
cosine similarities after equalizing:
cosine_similarity(e1, gender) = -0.7004364289309388
cosine_similarity(e2, gender) = 0.7004364289309388

Also the values are the same to about 15 places whether the words are related via gender or not, so the bias_axis doesn’t make a difference:

def try_words(w1, w2):
  e1, e2 = equalize((w1, w2), g, word_to_vec_map)
  print("cosine similarities after equalizing:")
  print(f"cosine_similarity(e'{w1}', gender) = ", cosine_similarity(e1, g))
  print(f"cosine_similarity(e'{w2}', gender) = ", cosine_similarity(e2, g))
try_words("man", "woman")
try_words("tree", "frog")
try_words("water", "over")

cosine similarities after equalizing:
cosine_similarity(e’man’, gender) = -0.7004364289309388
cosine_similarity(e’woman’, gender) = 0.7004364289309388
cosine similarities after equalizing:
cosine_similarity(e’tree’, gender) = -0.14225878555270474
cosine_similarity(e’frog’, gender) = 0.1422587855527047
cosine similarities after equalizing:
cosine_similarity(e’water’, gender) = 0.03853668934726111
cosine_similarity(e’over’, gender) = -0.038536689347261226

jincy-p-janardhanan · June 23, 2021, 9:34pm

Any updates on this thread?

I’m getting the same results for equalize(), i.e cosine similarity as ± 0.7004364289309388 for man and woman.

Another thing I would like to mention here is that, I get the following result for neutralize() :

cosine similarity between receptionist and g, after neutralizing:  -3.8581292784004314e-17

whereas the expected outcome is:

cosine similarity between receptionist and g, after neutralizing:  -4.442232511624783e-17

Is that relevant?

I’ve been reading the reference paper for de-biasing algorithms - Bolukbasi et al., 2016, Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings.

I’m confused here:

Should we be applying neutralize() before computing mean? I can’t actually figure out how the neutralize() in the notebook applies the 1st equation from the screenshot above.

Also, I think corrected_e_wB may not require mu_ortho to compute the norm in the denominator.

(I’ve tried the above variations, but the results were almost similar to that of above.)

Can someone explain what’s happening? - @Kic

Kic · June 24, 2021, 12:39pm

Hi everyone,

I have just checked to see if any progress has been made. All I can report back here is that the issue ticket has been allocated for investigation. If there is any update, I will report back ASAP.

andres.castillo · July 1, 2021, 9:11pm

Hi everyone,

I want to inform you that finally we have fixed that problem. I want to share with you that in the proposed implementation that we were using to get the “expected values”, we where using in the denominator of 3 expressions the norm of the vectors, when the formulas said norm**2. I confirm that the expected values are -0.7000 and 0.7000. Thanks for your patience and support.

Kic · July 2, 2021, 4:43pm

HI @andres.castillo

Many thanks for checking this query and the update.

nkelkar · July 6, 2021, 5:47pm

I had the same issue with the neutralize() function where I saw -3.858...e-17 instead of the expected -4.442...e-17. I figured out the issue for that one. Instead of using

np.linalg.norm(g) ** 2

I changed the implementation to use

np.sum(g * g)

And that seems to have “fixed” the mismatch issue. Although, I don’t know why (since the former formula should also work IMO).

Alienway · September 11, 2021, 6:25am

Thanks. Agree with your finding but I’ve no idea why the .norm() formula doesn’t work either.

I’ve tried to expand your .norm() expression to np.square(np.linalg.norm(g, ord = 2)), though expectedly this still gives -3.858…e-17.

Hope someone could point out what the issue is.

Alastair_Heffernan · February 12, 2022, 3:31pm

I get ±0.23 after equalising; I only get the ±0.700 answer after removing the \mu_{\perp} in the denominator of the expression for e_{w1/2B}^{corrected}.

So question: are the expressions in the assignment for e_{w1/2B}^{corrected} wrong? They certainly don’t line up with the ‘Step 2a’ quoted by jincy-p-janardhanan above…

martin75carames · March 7, 2022, 7:33pm

I’m getting:

   cosine similarities after equalizing:
     cosine_similarity(e1, gender) =  -0.23142942644908535
     cosine_similarity(e2, gender) =  0.2314294264490853

Instead of:

   cosine similarities after equalizing:
     cosine_similarity(e1, gender) =  -0.7004364289309388
     cosine_similarity(e2, gender) =  0.7004364289309388

These are my intermediate results:

mu:              [-0.137958    0.53917    -0.37717    -0.4749      1.5931      0.874175
mu_B:            [-0.02300709  0.05760748 -0.10820807 -0.01035456 -0.02724607  0.24860716
mu_orth:         [-1.14950914e-01  4.81562520e-01 -2.68961926e-01 -4.64545438e-01
e_w1B:           [ 0.02056491 -0.05149252  0.09672193  0.00925544  0.02435393 -0.22221784
e_w2B:           [-0.06657909  0.16670748 -0.31313807 -0.02996456 -0.07884607  0.71943216
corrected_e_w1B: [ 0.04148609 -0.10387709  0.19511945  0.01867122  0.04912977 -0.44828534
corrected_e_w2B: [-0.04148609  0.10387709 -0.19511945 -0.01867122 -0.04912977  0.44828534

In general I’m using:

    np.linalg.norm(A)       for the Norm

and

    np.linalg.norm(A)**2    for the sq-Norm

.
What am I doing wrong?

martin75carames · March 7, 2022, 7:34pm

@andres.castillo

Do you see anything wrong here? Can you help me?

Kic · March 9, 2022, 11:21am

Hi @martin75carames ,

Send me a direct message with your code for that function, I will have a look for you.

Savvas · August 8, 2022, 2:36pm

@martin75carames I am getting the same result as you did, any hint as to what was wrong with your code?

anon57530071 · August 9, 2022, 1:14am

Welcome to the community.

Please look at the equation closely.

e_{w1B}^{corrected} = \sqrt{ |{1 - ||\mu_{\perp} ||^2_2} |} * \frac{e_{\text{w1B}} - \mu_B} {||(e_{w1} - \mu_{\perp}) - \mu_B||_2} \tag{9}

In you case, I guess the error is in the denominator at the right. Most likely you used \mu in stead of \mu_B. In any cases, the error is in your code to calculate “corrected_e_w1B” and “corrected_e_w2B” if the output is same as Martin’s case.

Hope this helps.

Savvas · August 9, 2022, 2:51pm

Thanks for your reply. Unfortunately I could not identify the error in the denominator. From your reply I understand that for all the other values (mu, mu_B, mu_orth, e_w1B, ew2B) the expected output is what is shown in Martin’s post, right?