Please substitute the true label in the loss function and see for yourself that we are trying to align the predicted label with the true label.
Only 1 term will remain depending on the value of y.
i am confused how maximizing p(y=1/x)(that is probability of y=1 when x given) mean aligning predicted (p(y=1/x)) with true label y=1
There are two separate formulas: one for the case in which y = 1 and the other covers the case that y = 0. As Balaji pointed out, only one is active in each case (for each sample).
