Question about using NN to predicting sentiment

In this NPL specialty course 3, using NN to predict sentiment, I found it not clear how the model trains based on the given target data. The target data is a one dimensional array of 1 or 0 for y (postive or negative), the model however has an output dimension of 2. Only by comparing the two outputs to decide whether the prediction is 1 or 0. all these handling are somehow hidden from the code. How does Trax know to do such conversion when calculating the loss function? does the two lines below inform it?

loss_layer=tl.WeightedCategoryCrossEntropy(),
optimizer=trax.optimizers.Adam(0.01),

def get_train_eval_tasks(train_pos, train_neg, val_pos, val_neg, vocab_dict, loop, batch_size = 16):

rnd.seed(271)

train_task = training.TrainTask(
    labeled_data=train_generator(batch_size, train_pos
                , train_neg, vocab_dict, loop
                , shuffle = True),
    loss_layer=tl.WeightedCategoryCrossEntropy(),
    optimizer=trax.optimizers.Adam(0.01),
    n_steps_per_checkpoint=10,
)

eval_task = training.EvalTask(
    labeled_data=val_generator(batch_size, val_pos
                , val_neg, vocab_dict, loop
                , shuffle = True),        
    metrics=[tl.WeightedCategoryCrossEntropy(), tl.WeightedCategoryAccuracy()],
)

return train_task, eval_task

train_task, eval_task = get_train_eval_tasks(train_pos, train_neg, val_pos, val_neg, Vocab, True, batch_size = 16)

The inputs shape is (4, 14)
input tensor: [3 4 5 6 7 8 9 0 0 0 0 0 0 0]; target 1; example weights 1
input tensor: [10 11 12 13 14 15 16 17 18 19 20 9 21 22]; target 1; example weights 1
input tensor: [5738 2901 3761 0 0 0 0 0 0 0 0 0 0 0]; target 0; example weights 1
input tensor: [ 858 256 3652 5739 307 4458 567 1230 2767 328 1202 3761 0 0]; target 0; example weights 1

def get_train_eval_tasks(train_pos, train_neg, val_pos, val_neg, vocab_dict, loop, batch_size = 16):

rnd.seed(271)

train_task = training.TrainTask(
    labeled_data=train_generator(batch_size, train_pos
                , train_neg, vocab_dict, loop
                , shuffle = True),
    loss_layer=tl.WeightedCategoryCrossEntropy(),
    optimizer=trax.optimizers.Adam(0.01),
    n_steps_per_checkpoint=10,
)

eval_task = training.EvalTask(
    labeled_data=val_generator(batch_size, val_pos
                , val_neg, vocab_dict, loop
                , shuffle = True),        
    metrics=[tl.WeightedCategoryCrossEntropy(), tl.WeightedCategoryAccuracy()],
)

return train_task, eval_task

train_task, eval_task = get_train_eval_tasks(train_pos, train_neg, val_pos, val_neg, Vocab, True, batch_size = 16)
model = cl

Hi @PZ2004

That is a good question. The same thing is extended when predicting not only 0, 1 but also 2, 3, … v (here v could be the vocabulary length).

In simple terms:
Trax is programmed so that it knows whose probability to calculate the loss on. So for example, if the target is 0, then trax picks the first output from the output and checks if it’s high or low and later updates the weights accordingly. Same thing would be for example if the target was 23042 (the word 23042), then trax would pick the output (probability) for 23042 and check if that is high or low. This way it knows how to update weights - if the probability is high - loss is low and vice versa.

In concrete terms:
An example output from the model is of shape (16, 2) (batch size, output size)
An example target is of shape (16, 1)
Trax only looks at the column of the output where the target is.

An extended example for language modeling:
An example output form the model is of shape (32, 33042) (batch size, vocabulary size)
An example target is of shape (32, 1)
Trax only looks at the column of the output where the target is.

I hope that makes sense :slight_smile:

Cheers