In the week 2, the lecture on the GLove models Andrew arrived at this equation:

However I am not entirely able to understand the intuition behind this loss function.

How do we arrive at this relationship between the inner product between the feature vector and log(X_{i j}).

From this loss function are we optimising both theta_i and e_j or we just optimise e_j