I believe the graph of the -log (a_j) function (see at 10:00 in the video: Softmax) should cross and stop at a_j = 1, where loss = 0. This is because any number in the power of 0 is equal to 1 (i.e. at a_j = 1, loss = 0 due to numpy.exp(-loss) = a_j, which a re-written loss = - log(a_j) ) and the probability (a_j) cannot be more than 1 (i.e. 100%).