i used the tf.keras.losses.categorical_crossentropy and also tf.reduce.sum
also logits and labels are in shape of (6, num_examples) , so i used tf.transpose on then so that i have them in expected shape for categorical_crossentropy function
At the end i devided the the result into unmber of example
still i get a big different
In this function we are computing the sum, not the average of the costs. The other thing to check is to make sure you used the from_logits parameter correctly. We are passing the “logits” here and not the softmax output values, right? Here’s a thread which talks about that and why it is done that way.

Here’s a thread which talks about why it’s the sum, not the average.

Thank you, after using from_logits parameter and removing the deviding part of code, it went smooth.
The reason of my confusion was this part Text in Exercise:

The second link I gave you in my previous reply explains exactly that point. Please have another look at it.