Getting a vocal size of 26871 instead of 138858

pyaj · November 1, 2021, 1:26pm

sentences=[]
labels=[]
random.shuffle(corpus)
for x in range(training_size):
    sentences.append(corpus[x][0])
    labels.append(corpus[x][1])


tokenizer = Tokenizer()
tokenizer.fit_on_texts(sentences)

word_index = tokenizer.word_index
vocab_size=len(word_index)

sequences = tokenizer.texts_to_sequences(sentences)
padded = pad_sequences(sequences, maxlen=max_length, padding=padding_type, truncating=trunc_type)

split = int(test_portion * training_size)

test_sequences = padded[:split]
training_sequences = padded[split:training_size]
test_labels = labels[:split]
training_labels = labels[split:training_size]

pyaj · November 2, 2021, 4:23am

I cannot pass the exam if I do not get it right

pyaj · November 2, 2021, 8:11am

The error was gone after changing a few times the training size!

Topic		Replies	Views
I am getting error in tokenizer Natural Language Processing in TensorFlow	12	312	January 17, 2023
Two questions on C3W2 assignment Natural Language Processing in TensorFlow week-2 , week-3 , week-4	1	566	June 19, 2022
Tokenize_labels Natural Language Processing in TensorFlow week-2 , week-3 , week-4	1	530	September 10, 2022
Test vocab size mismatch for Exercise 1 Updated NER assignment NLP with Sequence Models week-2	10	427	January 18, 2024
The Model for this assignment Natural Language Processing in TensorFlow	14	427	August 2, 2022

Getting a vocal size of 26871 instead of 138858

Related topics