if the first term (before the regularization term) of each the cost function is the same, then when they are added together, why it still has half as the coefficient instead of 1?
If I understand the video on collaborative filter correctly, the logic of the final cost function is that the first cost function is using known x and y to deduce w and b. The second cost function is using known w, b and y to deduce x. Since the first term (before regularization term) is the same in both cost functions, then adding them together become using known y to deduce w, b and x. Is my understanding correct?
Hello @flyunicorn, we don’t add them together. We put them together by adding the error term and the two regularization terms together.
The 3 cost functions corresponds to three separate cases:
the top one assumes only x and y are given as data
the middle one assumes only w, b, and y are given as data
the bottom one assumes only y is given as data
Obviously the three cases can’t co-exist, so when you think about the bottom one, you couldn’t assume the first two to hold true. In other words, the bottom one is not a sum of the first two. They are separate cases and we form each of them but summing the error and regularization (applying to all trainable parameters).
I suggest you to watch the video again to see that Andrew was only discussing them as individual cases.
Since the bottom one is more close to reality where in most cases we only know some users’s rating on some movies, we don’t know w, b and x. Why we need the other two cases?