I am confused with how the tutor mentioned Maximum likelihood estimation in which we maximize the log likelihood. Here P(y/x) depends on the value of y=0 or 1, then how can you say we need to maximize P(y/x). If so we will be maximizing the likelihood of what ? either y=1 or y=0???
Please substitute the true label in the loss function and see for yourself that we are trying to align the predicted label with the true label.
Only 1 term will remain depending on the value of y
.
1 Like
i am confused how maximizing p(y=1/x)(that is probability of y=1 when x given) mean aligning predicted (p(y=1/x)) with true label y=1
There are two separate formulas: one for the case in which y = 1 and the other covers the case that y = 0. As Balaji pointed out, only one is active in each case (for each sample).