Is the 1/m term missing here in the cost function? Also, how do the partial derivatives get computed for the cost function given that it is not in “one piece”? Before we used the 1-y trick to put it together. Does the same thing happen here with just more terms?

Since this is the softmax calculation, there’s no 1/m term required.
This is simply re-scaling the output values from logistic regression, so they all sum to 1.

I thought the m term came from the samples, which is independent of the fact that it’s softmax right? That left most summation from 1 to m, I thought that was accumulating the losses, which would then get averaged out for the batch. I can understanding the a vector adding up to 1, but do not understand how the losses would add add up that way. Thank you!

The 1/m term is not needed for the softmax of 1 sample. But I think we need the 1/m for the cost of all samples. I will share this with the course team.

We don’t use the 1-y trick here because there are more than 2 classes assumed, instead we use the indicator function \mathbb{1} as defined in the first line of your screenshot.