Yes, the point is that the loss is always the sum of the losses for the individual training samples and it is only at the level of the values for the individual samples that we need to do the max to assure that the answer is non-negative. So it should be clear from looking at the formula that the sum is the “outer” operation.
Glad to hear that it makes sense now. If it’s any comfort, you are far from the first person to step on that landmine.