In the C3W3 programming assignment, we are training a model to perform classification and using the cross-entropy loss, which works with:
- targets: usually the one-hot encoded versions of the class indices (right class has p=1 and wrong class all have p=0).
- predictions: usually a vector whose values must sum to 1.
However, I noticed in the assignment, as well as in the video Training NERs: Data Processing, that:
- the targets are integers (the actual indices of the named entity classes).
- the predictions are the log-softmax of the final activations of the model, which are not probabilities that sum to 1.
While typing what would be a question, I found the reason for this in the trax documentation: tl.CrossEntropyLoss
is deprecated and does not actually compute cross-entropy. It needs tl.LogSoftmax
before it to work. It also seems to convert integer class indices to one-hot representations automatically, although not mentioned in the docs.