Cost Function For Multi-Task Learning Needs Explaination


In Course 3, Week 2, Chapter: Learning from Multiple Tasks, Video: Multi-task Learning, At 2:40 mark.
Why the cost function is averaged by dividing by m and not 4*m since there are 4 outputs to each example?

Thank you.

Hi @medbenchohra and welcome to Discourse. Cost function is averaged by the number of samples (batch size) which is m. This is the convention, and it doesn’t matter how many output(s) the network has. You want to average the total error of the network (in this case, the inner sum of 4 outputs) over all samples, so you divide by m.

1 Like

Hi @yanivh, thanks for your answer.

It’s much clearer now, the cost should be averaged using the number of the samples.
What got me confused is the inner sum of outputs, why shouldn’t that be averaged by the number of outputs? Is it wrong to consider the error of the network the average of outputs rather than the sum of outputs? Or any other metric for that matter such as the quadratic mean? And would that have any substantial effect on the performance achieved?

Thank you.