In the week 2, the lecture on the GLove models Andrew arrived at this equation:
However I am not entirely able to understand the intuition behind this loss function.
How do we arrive at this relationship between the inner product between the feature vector and log(X_{i j}).
From this loss function are we optimising both theta_i and e_j or we just optimise e_j