In the first video of Week 3, Andrew used the squared error to compute the loss function but he also mentioned that’s the simplified version and in practice, one would probably use cross-entropy loss for pc, squared error for bx,by,etc. I’m wondering once we compute the losses for each individual output here (pc,bx,by,…c1,c2,c3), would we sum all of those losses together to produce the final loss for one training example?
BRs,
hoang