I have problem with using loss functions, can someone explain why Andrew says in the first video of Week3 :

In practice, you could probably use a “LOG” like feature loss for
the c1, c2, c3 to the softmax output.
One of those elements usually you can use squared error or
something like squared error for the bounding box coordinates and
if a pc you could use something like the logistics regression loss.
Although even if you use squared error it’ll probably work okay.

The loss function to use in a given case is always a choice that you need to make. Whether you use a distance based loss (e.g. MSE or MAE) versus a “cross entropy” function depends on what your output represents: either a continuous real number or something like a probability distribution (more common in “classification” problems).

The case of YOLO is interesting in that the model is outputting a number of different types of outputs, including bounding boxes and classifications (pedestrian, car, tree, stop light …). So you may need a “hybrid” approach to the cost function in which you are adding terms for each aspect of the output. Maybe the bounding box needs something like MSE whereas the classification of the contents needs something more like cross entropy. I think that is what Prof Ng was getting at in that section that you quote.

Here’s another recent thread that just talks in general about the differences between distance and entropy style loss functions, but not specific to YOLO.