In the assignment, we train an NER system for classification with cross-entropy loss without comparing one-hot targets to probabilities?

tetamusha · March 29, 2023, 1:10pm

In the C3W3 programming assignment, we are training a model to perform classification and using the cross-entropy loss, which works with:

targets: usually the one-hot encoded versions of the class indices (right class has p=1 and wrong class all have p=0).
predictions: usually a vector whose values must sum to 1.

However, I noticed in the assignment, as well as in the video Training NERs: Data Processing, that:

the targets are integers (the actual indices of the named entity classes).
the predictions are the log-softmax of the final activations of the model, which are not probabilities that sum to 1.

While typing what would be a question, I found the reason for this in the trax documentation: tl.CrossEntropyLoss is deprecated and does not actually compute cross-entropy. It needs tl.LogSoftmax before it to work. It also seems to convert integer class indices to one-hot representations automatically, although not mentioned in the docs.

Elemento · March 29, 2023, 2:21pm

Hey @tetamusha,
Thanks for the answer alongside the question. I am sure it will help all the learners who come across this thing.

Cheers,
Elemento

Topic		Replies	Views
Week 1 questions Sequence Models	1	525	December 26, 2021
Quiz answer might be wrong: "What is the cross entropy loss for the following matrix?" Embedding Models: From Architecture to Implementat	5	184	February 4, 2025
Question about using NN to predicting sentiment NLP with Attention Models week-1	1	316	September 28, 2023
C3 W3 Assignment - Training: low accuracy NLP with Sequence Models week-3	2	527	September 21, 2022
WeightedCategoryCrossEntropy in C3 W1 NLP with Sequence Models week-1	1	465	May 15, 2023

In the assignment, we train an NER system for classification with cross-entropy loss without comparing one-hot targets to probabilities?

Related topics