Correct me if I’m wrong.
Your answer is:
−[0⋅log(0.7)+1⋅log(0.3)] = −log(0.3) ≈ 1.204
−[1⋅log(0.6)+0⋅log(0.4)] = −log(0.6) ≈ 0.511
Total Loss:
1.204+0.511 ≈ 1.715
However, my explanation is:
- Consider using sigmoid or binary entropy:
Truelabel: 0 -> -(0*log(positive prob) + (1-0)*log(1-positive prob)) = -(0*log(0.3) + 1*log(0.7) = -log(0.7)
Truelabel: 1 -> -(1*log(positive prob) + (1-1)*log(1-positive prob)) = -(1*log(0.6) + 0*log(0.4)) = -log(0.6)
- In softmax or multi-class entropy approach, either one-hot or nll_loss
Truelabel: nll_loss method to use class-index:0 or one-hot method to use vector [1,0] -> choose class 0 prob: -log(0.7)
Truelabel: nll_loss method to use class-index:1 or one-hot method to use vector [0,1] -> choose class 1 prob: -log(0.6)
I think both entropy approaches should yield the same answer, and it was exactly what I expected.