In the assignment, we train an NER system for classification with cross-entropy loss without comparing one-hot targets to probabilities?

In the C3W3 programming assignment, we are training a model to perform classification and using the cross-entropy loss, which works with:

  • targets: usually the one-hot encoded versions of the class indices (right class has p=1 and wrong class all have p=0).
  • predictions: usually a vector whose values must sum to 1.

However, I noticed in the assignment, as well as in the video Training NERs: Data Processing, that:

  • the targets are integers (the actual indices of the named entity classes).
  • the predictions are the log-softmax of the final activations of the model, which are not probabilities that sum to 1.

While typing what would be a question, I found the reason for this in the trax documentation: tl.CrossEntropyLoss is deprecated and does not actually compute cross-entropy. It needs tl.LogSoftmax before it to work. It also seems to convert integer class indices to one-hot representations automatically, although not mentioned in the docs.

Hey @tetamusha,
Thanks for the answer alongside the question. I am sure it will help all the learners who come across this thing.