Using Keras' Tokenizer yields values that start at 1 rather than at 0.?

Praveen_Chandrasekar · September 15, 2022, 10:03pm

In the week-2 assignment, I came across a statement in the notebook that “Using Keras’ Tokenizer yields values that start at 1 rather than at 0. and asks me to subtract 1 from each value in the output array”. Why am I doing this? The number of values in the output array (label_seq_np) is equal to 1780, from this I understand that there will be 1780 neurons present in out first layer input layer. To work our way around, why should we add one extra neuron in out architecture?

balaji.ambresh · October 1, 2022, 6:16am

Numeric labels produced by keras tokenizer starts from 1. This is because, 0 is reserved for internal use.

A multiclass classification model has number of output units set to the number of classes.Here’s what happens during a single forward pass:

Feed input to the model
For each layer:
a. Generate output by first performing the linear transformation of dotProduct(weights, input) + bias
b. Apply layer activation function to the output of previous step.
c. Feed this output to the next layer
In the final layer, once you apply the correct activation function, the outputs represent probability of the user input belonging to that particular class. The prediction for an input is the argmax of the output generated by the output layer.

The prediction is going to be in range [0, num_num_classes - 1]. This explains the -1 when using labels. You can add another neuron but it’s not required. This becomes apparent when doing a binary classification. The model becomes slow due to 2 output units instead of 1.

Topic		Replies	Views
First 5 labels of tokenized Natural Language Processing in TensorFlow week-2 , week-3 , week-4	1	521	November 26, 2022
How to Implement np.array - 1 Natural Language Processing in TensorFlow week-2 , week-3 , week-4	10	613	July 30, 2023
In Lab1, whats the rationale for adding the 1 to the 'total_words' variable Natural Language Processing in TensorFlow week-4	4	396	May 27, 2022
Am I the only one doing "+1" to the number of output units? Convolutional Neural Networks in TensorFlow week-4	4	557	April 1, 2022
Tokenize_labels() function in assignment? Natural Language Processing in TensorFlow week-2 , week-3 , week-4	7	817	October 23, 2023

Using Keras' Tokenizer yields values that start at 1 rather than at 0.?

Related topics