I try to average the gender vector and the unit-length of it over 5 differences:
_gender = (_word_to_vec_map['female'] - _word_to_vec_map['male'] + _word_to_vec_map['woman'] - _word_to_vec_map['man'] + _word_to_vec_map['mother'] - _word_to_vec_map['father'] + _word_to_vec_map['girl'] - _word_to_vec_map['boy'] + _word_to_vec_map['gal'] - _word_to_vec_map['guy']) / 5
_bias_axis = (_word_to_vec_map_unit_vectors['female'] - _word_to_vec_map_unit_vectors['male'] + _word_to_vec_map_unit_vectors['woman'] - _word_to_vec_map_unit_vectors['man'] + _word_to_vec_map_unit_vectors['mother'] - _word_to_vec_map_unit_vectors['father'] + _word_to_vec_map_unit_vectors['girl'] - _word_to_vec_map_unit_vectors['boy'] + self._word_to_vec_map_unit_vectors['gal'] - _word_to_vec_map_unit_vectors['guy']) / 5
These are their values:
_gender: [ 0.139252 0.2494736 -0.077044 0.078686 -0.338172 0.5356952
0.24755616 -0.014782 0.28579372 -0.03800272 0.141274 -0.543942
0.4904082 0.212256 0.050238 -0.0949008 -0.423742 0.0533926
0.3795708 0.30802 0.329332 0.252952 0.2486384 0.1790466
0.033638 0.247894 -0.0144 0.064134 -0.258742 -0.1316492
-0.3956292 0.1423458 -0.17959 0.11332157 -0.1289458 -0.089151
-0.15220774 -0.2624756 0.205116 0.0670106 -0.1386252 -0.212921
0.4942532 -0.441349 0.106379 -0.3074928 0.236484 0.174356
0.0898276 -0.2535992 ]
_bias_axis: [ 2.49118998e-02 4.27001208e-02 -3.62042235e-03 1.76584409e-02
-7.70039035e-02 9.81583413e-02 6.29461511e-02 -1.06418055e-02
4.19868767e-02 -1.90138046e-07 2.76648796e-02 -1.04429346e-01
8.40711809e-02 4.76526419e-02 4.07180327e-03 -2.51125919e-02
-6.98786537e-02 2.21829724e-02 7.06141474e-02 6.43192103e-02
6.06012762e-02 4.27772218e-02 5.24388473e-02 3.56686583e-02
-5.34867687e-03 5.44789744e-02 -3.90208726e-03 1.17964080e-02
-5.44582872e-02 -2.00531517e-02 -9.30823421e-02 3.18587190e-02
-4.41640081e-02 3.12046311e-02 -2.80779818e-02 -1.73396384e-02
-3.13046072e-02 -5.58027254e-02 5.26828848e-02 2.51920011e-02
-2.14687110e-02 -3.59139215e-02 9.52044363e-02 -8.42399699e-02
1.96694413e-02 -6.21964372e-02 4.77032156e-02 5.22517069e-02
2.56968043e-02 -4.64346020e-02], sum: 0.35368983195653886
It doesn’t seem right to me because the difference before and after debiasing is not as large as expected. Here is the following console output:
=== _gender === === _bias_axis ===
john: -0.5267322464703796, -0.05837690255727943
marie: 0.1364852833862198, -0.04001224749084058
sophie: 0.15609092129003208, -0.050421491142398904
ronaldo: -0.32688543526742, 0.0022154029435832306
priya: 0.1652718134821406, -0.007175382706850162
rahul: -0.1847808381696328, 0.005864579497628745
danielle: 0.12541822126856073, -0.033764062620807604
reza: -0.004458757928913372, 0.018608391991402646
katy: 0.1534938102017636, -0.05347238532685372
yasmin: 0.23648489955406324, 0.005900555028932008
=== _gender === === _bias_axis ===
lipstick: 0.2563283006136778, -0.02263324170915695
guns: -0.15957118496219078, -0.005352090879613553
science: -0.07265210850307718, -0.012818659494469132
arts: -0.0746817929641194, -0.025242006934876868
literature: -0.0029044288625196947, -0.027414584425058015
warrior: -0.2656626256591623, -0.02639334294575092
doctor: -0.060041760137426764, -0.03877619128207007
tree: -0.13192518011711302, -0.0538317267384064
receptionist: 0.15632312433979648, -0.025266318177526354 <- XXX
technology: -0.19612164276764785, -0.01943173405850256
fashion: -0.19344489188557096, -0.04750008073816098
teacher: -0.0519019259869694, -0.04052553419903423
engineer: -0.2560882371499192, -0.01888079962866415
pilot: -0.13153848690688022, -0.032098185362752565
computer: -0.2576835290332658, -0.02636802889445321
singer: 0.0005360216386984724, -0.056720780195697845
scientist: -0.10163206796226365, -0.00636415049168681
The numbers in _gender column is cosine_similarity with _gender without debiasing the word while the ones in _bias_axis is cosine_similarity between the debiased word and L2 norm (unit length) of the _bias_axis.
For example compare the value of receptionist with the values in the Colab notebook:
cosine similarity between receptionist and g, before neutralizing: 0.3307794175059374
cosine similarity between receptionist and g_unit, after neutralizing: 3.399606663925154e-17
Any insight on this?