I am having trouble getting my model to run. Any help would be appreciated. Right now, my “MAXLEN” is un defined.
Please don’t post your assignment. I am removing it.
Please search where MAXLEN
is defined in the notebook, and make sure you have run that cell to define it.
sorry about that. there’s no answers to be had in that assignment.
I followed you advice, I get “ValueError: Data cardinality is ambiguous:
x sizes: 144000
y sizes: 16000
Make sure all arrays contain the same number of samples.”
what can I do?
Moved this thread to the TensorFlow Developer Professional category as the question should be about one of its assignment.
Here are some hints:
- In
train_val_split
, don’t hardcodetrain_size
. Instead, use the number of sentences andtraining_split
to calculatetrain_size
and the validation size. - In
fit_tokenizer
, don’t hardcode160000
. For this assignment, you should read up onTokenizer
to see what happens when you don’t providenum_words
parameter. - In
create_model
, there is nomax_length
. See the function signature to pick the right value forinput_length
.
- When I don’t hardcode train_size, I get an error: “TypeError: can’t multiply sequence by non-int of type ‘float’”
(my code: train_size = int(sentences*training_split)).
When I don’t hardcode fit_tokenizer, I get an error: “NameError: name ‘train_sentences’ is not defined”
(my code: tokenizer = Tokenizer(num_words = vocab_size, oov_token=‘’)
# Fit the tokenizer to the training sentences
tokenizer.fit_on_texts(train_sentences))
as for your third point, please send me the function signature. Thanks!
issue #1: resolved!!!
this code: " # Instantiate the Tokenizer class, passing in the correct values for num_words and oov_token
tokenizer = Tokenizer(num_words=vocab_size, oov_token=oov_tok)
Fit the tokenizer to the training sentences
tokenizer.fit_on_texts(train_sentences)"
What has gone wrong here?
Here’s the method signature you’re looking for:
def create_model(vocab_size, embedding_dim, maxlen, embeddings_matrix):
As far as the Tokenizer
issue is concerned, what happens when you don’t specify the num_words
parameter?
[code removed - moderator]
"### START CODE HERE
# Instantiate the Tokenizer class, passing in the correct values for num_words and oov_token
NameError Traceback (most recent call last)
in
1 # Test your function
----> 2 tokenizer = fit_tokenizer(train_sentences, oov_token)
3 word_index = tokenizer.word_index
4 vocab_size = len(word_index)
5
NameError: name ‘oov_token’ is not defined"
My notes include in the tokenizer: num_words=vocab_size. I think this is what you’re referring to, but the error persists.
The notebook starter code has this line:
tokenizer = fit_tokenizer(train_sentences, OOV_TOKEN)
Your notebook has this:
tokenizer = fit_tokenizer(train_sentences, oov_token)
Please refresh your workspace and try again.
See Refresh your Lab Workspace
section here
On a related note, look at the signature: def fit_tokenizer(train_sentences, oov_token):
. Inside the notebook, you are referring to an undefind variable oov_tok
. Please fix that as well.
Moving forward, don’t post your notebook in public.
after the changes you suggested, I still get: “NameError: name ‘vocab_size’ is not defined.”
here’s the relevant code: “def fit_tokenizer(train_sentences, OOV_TOKEN):
“””
Instantiates the Tokenizer class on the training sentences
Args:
train_sentences (list of string): lower-cased sentences without stopwords to be used for training
oov_token (string) - symbol for the out-of-vocabulary token
Returns:
tokenizer (object): an instance of the Tokenizer class containing the word-index dictionary
"""
### START CODE HERE
# Instantiate the Tokenizer class, passing in the correct values for num_words and oov_token
tokenizer = Tokenizer(num_words=vocab_size, OOV_TOKEN=oov_tok)
# Fit the tokenizer to the training sentences
tokenizer.fit_on_texts(train_sentences)
### END CODE HERE
return tokenizer
Test your function
tokenizer = fit_tokenizer(train_sentences, OOV_TOKEN)
word_index = tokenizer.word_index
vocab_size = len(word_index)
print(f"Vocabulary contains {vocab_size} words\n")
print(“ token included in vocabulary” if “” in word_index else “ token NOT included in vocabulary”)
print(f"\nindex of word ‘i’ should be {word_index[‘i’]}")
NameError Traceback (most recent call last)
in
1 # Test your function
----> 2 tokenizer = fit_tokenizer(train_sentences, OOV_TOKEN)
3 word_index = tokenizer.word_index
4 vocab_size = len(word_index)
5
in fit_tokenizer(train_sentences, OOV_TOKEN)
13
14 # Instantiate the Tokenizer class, passing in the correct values for num_words and oov_token
—> 15 tokenizer = Tokenizer(num_words=vocab_size, OOV_TOKEN=oov_tok)
16 # Fit the tokenizer to the training sentences
17 tokenizer.fit_on_texts(train_sentences)
NameError: name ‘vocab_size’ is not defined
"
@vang37 Why are you posting your code in public?
If you want to share your code, click my name and message your notebook as an attachment.
Don’t forget to remove code from your recent post.