They aren’t probabilities. The important thing is the positive number 0.77 in position [2] of the array. np.argmax finds the index with the largest value. We have trained the neural network to return smaller numbers if it doesn’t match and larger numbers if it does match. That’s why there are a bunch of negative numbers.

In the next step of the code it does the same thing using softmax which does return probabilities. If you add up all the numbers in the array prediction_p they all add up to 1.

Just to add, since the sigmoid-then-softmax functions are monotonic, it doesn’t matter whether it is included when you are trying to find the output with the highest value.

Some other things that I want to add is that those numbers are called logits, and they can be positive or negative. Softmax transforms logits to probabilities. The model keeps logits as output to take advantage of the from_logits=True option which has been explained in this video by Andrew.