Lost - why are we calculating "v"

If we have the user features in the vector xu and the movie features in vector xm, why are we calculating vu and vm?

That is because we hope, after transformation, “v” will give a better dot product than “x”. We need to realize that “v” is going to be different from “x”, and the force of gradient descent drives “v” towards giving better dot-product than “x”, and by being better, I mean they better equal to the prediction.