Question about using NN to predicting sentiment

PZ2004 · September 27, 2023, 6:48pm

In this NPL specialty course 3, using NN to predict sentiment, I found it not clear how the model trains based on the given target data. The target data is a one dimensional array of 1 or 0 for y (postive or negative), the model however has an output dimension of 2. Only by comparing the two outputs to decide whether the prediction is 1 or 0. all these handling are somehow hidden from the code. How does Trax know to do such conversion when calculating the loss function? does the two lines below inform it?

loss_layer=tl.WeightedCategoryCrossEntropy(),
optimizer=trax.optimizers.Adam(0.01),

def get_train_eval_tasks(train_pos, train_neg, val_pos, val_neg, vocab_dict, loop, batch_size = 16):

rnd.seed(271)

train_task = training.TrainTask(
    labeled_data=train_generator(batch_size, train_pos
                , train_neg, vocab_dict, loop
                , shuffle = True),
    loss_layer=tl.WeightedCategoryCrossEntropy(),
    optimizer=trax.optimizers.Adam(0.01),
    n_steps_per_checkpoint=10,
)

eval_task = training.EvalTask(
    labeled_data=val_generator(batch_size, val_pos
                , val_neg, vocab_dict, loop
                , shuffle = True),        
    metrics=[tl.WeightedCategoryCrossEntropy(), tl.WeightedCategoryAccuracy()],
)

return train_task, eval_task

train_task, eval_task = get_train_eval_tasks(train_pos, train_neg, val_pos, val_neg, Vocab, True, batch_size = 16)

The inputs shape is (4, 14)
input tensor: [3 4 5 6 7 8 9 0 0 0 0 0 0 0]; target 1; example weights 1
input tensor: [10 11 12 13 14 15 16 17 18 19 20 9 21 22]; target 1; example weights 1
input tensor: [5738 2901 3761 0 0 0 0 0 0 0 0 0 0 0]; target 0; example weights 1
input tensor: [ 858 256 3652 5739 307 4458 567 1230 2767 328 1202 3761 0 0]; target 0; example weights 1

def get_train_eval_tasks(train_pos, train_neg, val_pos, val_neg, vocab_dict, loop, batch_size = 16):

rnd.seed(271)

train_task = training.TrainTask(
    labeled_data=train_generator(batch_size, train_pos
                , train_neg, vocab_dict, loop
                , shuffle = True),
    loss_layer=tl.WeightedCategoryCrossEntropy(),
    optimizer=trax.optimizers.Adam(0.01),
    n_steps_per_checkpoint=10,
)

eval_task = training.EvalTask(
    labeled_data=val_generator(batch_size, val_pos
                , val_neg, vocab_dict, loop
                , shuffle = True),        
    metrics=[tl.WeightedCategoryCrossEntropy(), tl.WeightedCategoryAccuracy()],
)

return train_task, eval_task

train_task, eval_task = get_train_eval_tasks(train_pos, train_neg, val_pos, val_neg, Vocab, True, batch_size = 16)
model = cl

arvyzukai · September 28, 2023, 5:20am

Hi @PZ2004

That is a good question. The same thing is extended when predicting not only 0, 1 but also 2, 3, … v (here v could be the vocabulary length).

In simple terms:
Trax is programmed so that it knows whose probability to calculate the loss on. So for example, if the target is 0, then trax picks the first output from the output and checks if it’s high or low and later updates the weights accordingly. Same thing would be for example if the target was 23042 (the word 23042), then trax would pick the output (probability) for 23042 and check if that is high or low. This way it knows how to update weights - if the probability is high - loss is low and vice versa.

In concrete terms:
An example output from the model is of shape (16, 2) (batch size, output size)
An example target is of shape (16, 1)
Trax only looks at the column of the output where the target is.

An extended example for language modeling:
An example output form the model is of shape (32, 33042) (batch size, vocabulary size)
An example target is of shape (32, 1)
Trax only looks at the column of the output where the target is.

I hope that makes sense

Cheers

Topic		Replies	Views
How to determine the input format of tensor for the evaluation/prediction mode in Trax NLP Resources	6	164	June 27, 2023
Assignment 1, Error in training loop NLP with Sequence Models week-1	2	484	April 27, 2023
Natural Language Processing W3 Assignment - Tensorflow Developer Professional Certificate Natural Language Processing in TensorFlow	1	365	October 17, 2022
In the assignment, we train an NER system for classification with cross-entropy loss without comparing one-hot targets to probabilities? NLP with Sequence Models week-3	1	482	March 29, 2023
Course Three Week 1 Assignment 1 Ex 6 - Problem with Implementing Training NLP with Sequence Models week-1	6	632	April 6, 2022

Question about using NN to predicting sentiment

Related topics