I don’t really get it why adding multiple users allow us to introduce more variables (set features to be unknown).
Recall from Andrew, "By the way, notice that this (cost function J(w, b, x)) works only because we have parameters for four users. That’s what allows us to try to guess appropriate features, x_1. " Starts from 4:05, Collaborative filtering algorithm | Coursera
Originally, a linear regression MSE cost function J(w,b) is a quadratic form. After setting feature X to be a variable, the MSE cost function for collaborative filtering turns out to be a quartic form. It seems that having more users (samples?) will not change form of J, thus won’t affect whether or not gradient descent can apply.