C5_W2 Embeddings/Word Vectors: gender vectors average don't seem right or have I done it correctly?

I try to average the gender vector and the unit-length of it over 5 differences:

_gender = (_word_to_vec_map['female'] - _word_to_vec_map['male'] + _word_to_vec_map['woman'] - _word_to_vec_map['man'] + _word_to_vec_map['mother'] - _word_to_vec_map['father'] + _word_to_vec_map['girl'] - _word_to_vec_map['boy'] + _word_to_vec_map['gal'] - _word_to_vec_map['guy']) / 5

_bias_axis = (_word_to_vec_map_unit_vectors['female'] - _word_to_vec_map_unit_vectors['male'] + _word_to_vec_map_unit_vectors['woman'] - _word_to_vec_map_unit_vectors['man'] + _word_to_vec_map_unit_vectors['mother'] - _word_to_vec_map_unit_vectors['father'] + _word_to_vec_map_unit_vectors['girl'] - _word_to_vec_map_unit_vectors['boy'] + self._word_to_vec_map_unit_vectors['gal'] - _word_to_vec_map_unit_vectors['guy']) / 5

These are their values:

_gender: [ 0.139252    0.2494736  -0.077044    0.078686   -0.338172    0.5356952
  0.24755616 -0.014782    0.28579372 -0.03800272  0.141274   -0.543942
  0.4904082   0.212256    0.050238   -0.0949008  -0.423742    0.0533926
  0.3795708   0.30802     0.329332    0.252952    0.2486384   0.1790466
  0.033638    0.247894   -0.0144      0.064134   -0.258742   -0.1316492
 -0.3956292   0.1423458  -0.17959     0.11332157 -0.1289458  -0.089151
 -0.15220774 -0.2624756   0.205116    0.0670106  -0.1386252  -0.212921
  0.4942532  -0.441349    0.106379   -0.3074928   0.236484    0.174356
  0.0898276  -0.2535992 ]
_bias_axis: [ 2.49118998e-02  4.27001208e-02 -3.62042235e-03  1.76584409e-02
 -7.70039035e-02  9.81583413e-02  6.29461511e-02 -1.06418055e-02
  4.19868767e-02 -1.90138046e-07  2.76648796e-02 -1.04429346e-01
  8.40711809e-02  4.76526419e-02  4.07180327e-03 -2.51125919e-02
 -6.98786537e-02  2.21829724e-02  7.06141474e-02  6.43192103e-02
  6.06012762e-02  4.27772218e-02  5.24388473e-02  3.56686583e-02
 -5.34867687e-03  5.44789744e-02 -3.90208726e-03  1.17964080e-02
 -5.44582872e-02 -2.00531517e-02 -9.30823421e-02  3.18587190e-02
 -4.41640081e-02  3.12046311e-02 -2.80779818e-02 -1.73396384e-02
 -3.13046072e-02 -5.58027254e-02  5.26828848e-02  2.51920011e-02
 -2.14687110e-02 -3.59139215e-02  9.52044363e-02 -8.42399699e-02
  1.96694413e-02 -6.21964372e-02  4.77032156e-02  5.22517069e-02
  2.56968043e-02 -4.64346020e-02], sum: 0.35368983195653886

It doesn’t seem right to me because the difference before and after debiasing is not as large as expected. Here is the following console output:

      === _gender ===       === _bias_axis ===
john: -0.5267322464703796, -0.05837690255727943
marie: 0.1364852833862198, -0.04001224749084058
sophie: 0.15609092129003208, -0.050421491142398904
ronaldo: -0.32688543526742, 0.0022154029435832306
priya: 0.1652718134821406, -0.007175382706850162
rahul: -0.1847808381696328, 0.005864579497628745
danielle: 0.12541822126856073, -0.033764062620807604
reza: -0.004458757928913372, 0.018608391991402646
katy: 0.1534938102017636, -0.05347238532685372
yasmin: 0.23648489955406324, 0.005900555028932008

         === _gender ===      === _bias_axis ===
lipstick: 0.2563283006136778, -0.02263324170915695
guns: -0.15957118496219078, -0.005352090879613553
science: -0.07265210850307718, -0.012818659494469132
arts: -0.0746817929641194, -0.025242006934876868
literature: -0.0029044288625196947, -0.027414584425058015
warrior: -0.2656626256591623, -0.02639334294575092
doctor: -0.060041760137426764, -0.03877619128207007
tree: -0.13192518011711302, -0.0538317267384064
receptionist: 0.15632312433979648, -0.025266318177526354   <- XXX
technology: -0.19612164276764785, -0.01943173405850256
fashion: -0.19344489188557096, -0.04750008073816098
teacher: -0.0519019259869694, -0.04052553419903423
engineer: -0.2560882371499192, -0.01888079962866415
pilot: -0.13153848690688022, -0.032098185362752565
computer: -0.2576835290332658, -0.02636802889445321
singer: 0.0005360216386984724, -0.056720780195697845
scientist: -0.10163206796226365, -0.00636415049168681

The numbers in _gender column is cosine_similarity with _gender without debiasing the word while the ones in _bias_axis is cosine_similarity between the debiased word and L2 norm (unit length) of the _bias_axis.

For example compare the value of receptionist with the values in the Colab notebook:

cosine similarity between receptionist and g, before neutralizing:  0.3307794175059374
cosine similarity between receptionist and g_unit, after neutralizing:  3.399606663925154e-17

Any insight on this?

My guess is that there is some issue in the way you implemented the debiasing with your new “average of 5” vectors.

I tried it and here’s what I got using your _bias_axis vector, based on averaging the differences of the 5 pairs of unit vectors:

word = "receptionist"
print("cosine similarity between " + word + " and _bias_axis, before neutralizing: ", cosine_similarity(word_to_vec_map[word], _bias_axis))

e_debiased = neutralize(word, _bias_axis, word_to_vec_map_unit_vectors)
print("cosine similarity between " + word + " and _bias_axis, after neutralizing: ", cosine_similarity(e_debiased, _bias_axis))

cosine similarity between receptionist and _bias_axis, before neutralizing:  0.12951769059046733
cosine similarity between receptionist and _bias_axis, after neutralizing:  2.7045672243942412e-17

So I’d say the results are very comparable to what they show with the gender axis computed using only one pair of “female, male” words.

I found what the issue was. The neutralize() uses g everywhere in the code and the consumer of the function passes in the unit vectors of both the g and the vector maps. I missed that little details…