In the loss function I see we do the following things for the negative class

-(1-y_i) \cdot ln(1 - f_{\vec w, b})

Because in the logistic regression, we consider the positive class y_i (logistic term) to favorable outcome (probability term), and logistic also gives the probability (not sure why, also need answer of this) of the favorable outcome (f_{\vec w, b}), so based on probability definition P_+(X) + P_-(X) = 1 \implies P_- = 1 - P_+(x), we use 1 - f_{\vec w, b} as the “leftover” probability of the negative class.

Not exact, but roughly I have learnt this way. Is this correct explanation?

Here is how I will consider it: the loss for each sample is simply log(p).

Let’s say the model returns a as the predicted probability of the sample being positive.

If the sample’s label is positive, then p means “the probability that the model predicts it as positive”, and so p=a.

If the sample’s label is negative, then p means “the probability that the model predicts it as negative”, and so p=1-a.

Since a sample can either be positive or negative, we want to find a way to make the final loss function adaptable to either case, and we therefore have ylog(a) - (1-y)log(1-a). Note that the coefficients y and 1-y control which term to enable. If a sample is positive, then only log(a) is enabled.