Course1 Week4 Cost / Lost at the end of Forward pass

I tried to redo lecture forward/backward cycle, marking what matrix sizes I have eveywhere based on few assumptions (m=2, n[0]=3, n[1]=4, n[2]=3, n[3]=1).

Everything makes sense exept the Loss, or rather cost? Yea here is the problem. I figured what size dA[last] has to have to make it work. But Here is my question:

How do I calculate loss, do I give pairs of elements to my loss function and then put them in the vector? Yea alright that makes sense, but what about Cost, because at the end of the day we want to minimize COST, not individual losses.

I don’t know how to make my question more clear, I just want to know what exactly is happening at the end of forward pass. (my gut is telling me those are connected and it makes sense, but my brain doesn’t).

My thought process as a gif

This was all covered in the lectures and is apparent in the formulas you show and those shown in the notebook. The cost is the average of the loss values across all the samples. For each sample you get a scalar value as the loss value on that sample computed according to the cross entropy loss formula. Then you average those values to get the cost. Then back prop will try to minimize that cost value, meaning that it is taking all the samples into account.