The Model for this assignment

I am having trouble getting my model to run. Any help would be appreciated. Right now, my “MAXLEN” is un defined.

Please don’t post your assignment. I am removing it.

Please search where MAXLEN is defined in the notebook, and make sure you have run that cell to define it.

1 Like

sorry about that. there’s no answers to be had in that assignment.

I followed you advice, I get “ValueError: Data cardinality is ambiguous:
x sizes: 144000
y sizes: 16000
Make sure all arrays contain the same number of samples.”

what can I do?

Moved this thread to the TensorFlow Developer Professional category as the question should be about one of its assignment.

@vang37

Here are some hints:

  1. In train_val_split, don’t hardcode train_size. Instead, use the number of sentences and training_split to calculate train_size and the validation size.
  2. In fit_tokenizer, don’t hardcode 160000. For this assignment, you should read up on Tokenizer to see what happens when you don’t provide num_words parameter.
  3. In create_model, there is no max_length. See the function signature to pick the right value for input_length.
  1. When I don’t hardcode train_size, I get an error: “TypeError: can’t multiply sequence by non-int of type ‘float’”
    (my code: train_size = int(sentences*training_split)).

When I don’t hardcode fit_tokenizer, I get an error: “NameError: name ‘train_sentences’ is not defined”
(my code: tokenizer = Tokenizer(num_words = vocab_size, oov_token=‘’)
# Fit the tokenizer to the training sentences
tokenizer.fit_on_texts(train_sentences))

as for your third point, please send me the function signature. Thanks!

issue #1: resolved!!!

this code: " # Instantiate the Tokenizer class, passing in the correct values for num_words and oov_token

tokenizer = Tokenizer(num_words=vocab_size, oov_token=oov_tok)

Fit the tokenizer to the training sentences

tokenizer.fit_on_texts(train_sentences)"

What has gone wrong here?

Here’s the method signature you’re looking for:
def create_model(vocab_size, embedding_dim, maxlen, embeddings_matrix):

As far as the Tokenizer issue is concerned, what happens when you don’t specify the num_words parameter?

[code removed - moderator]

"### START CODE HERE

# Instantiate the Tokenizer class, passing in the correct values for num_words and oov_token

NameError Traceback (most recent call last)
in
1 # Test your function
----> 2 tokenizer = fit_tokenizer(train_sentences, oov_token)
3 word_index = tokenizer.word_index
4 vocab_size = len(word_index)
5

NameError: name ‘oov_token’ is not defined"

My notes include in the tokenizer: num_words=vocab_size. I think this is what you’re referring to, but the error persists.

The notebook starter code has this line:

tokenizer = fit_tokenizer(train_sentences, OOV_TOKEN)

Your notebook has this:

tokenizer = fit_tokenizer(train_sentences, oov_token)

Please refresh your workspace and try again.
See Refresh your Lab Workspace section here

On a related note, look at the signature: def fit_tokenizer(train_sentences, oov_token):. Inside the notebook, you are referring to an undefind variable oov_tok. Please fix that as well.

Moving forward, don’t post your notebook in public.

after the changes you suggested, I still get: “NameError: name ‘vocab_size’ is not defined.”

here’s the relevant code: “def fit_tokenizer(train_sentences, OOV_TOKEN):
“””
Instantiates the Tokenizer class on the training sentences

Args:
    train_sentences (list of string): lower-cased sentences without stopwords to be used for training
    oov_token (string) - symbol for the out-of-vocabulary token

Returns:
    tokenizer (object): an instance of the Tokenizer class containing the word-index dictionary
"""
### START CODE HERE

# Instantiate the Tokenizer class, passing in the correct values for num_words and oov_token
tokenizer = Tokenizer(num_words=vocab_size, OOV_TOKEN=oov_tok)
# Fit the tokenizer to the training sentences
tokenizer.fit_on_texts(train_sentences)


### END CODE HERE

return tokenizer

Test your function

tokenizer = fit_tokenizer(train_sentences, OOV_TOKEN)
word_index = tokenizer.word_index
vocab_size = len(word_index)

print(f"Vocabulary contains {vocab_size} words\n")
print(“ token included in vocabulary” if “” in word_index else “ token NOT included in vocabulary”)
print(f"\nindex of word ‘i’ should be {word_index[‘i’]}")


NameError Traceback (most recent call last)
in
1 # Test your function
----> 2 tokenizer = fit_tokenizer(train_sentences, OOV_TOKEN)
3 word_index = tokenizer.word_index
4 vocab_size = len(word_index)
5

in fit_tokenizer(train_sentences, OOV_TOKEN)
13
14 # Instantiate the Tokenizer class, passing in the correct values for num_words and oov_token
—> 15 tokenizer = Tokenizer(num_words=vocab_size, OOV_TOKEN=oov_tok)
16 # Fit the tokenizer to the training sentences
17 tokenizer.fit_on_texts(train_sentences)

NameError: name ‘vocab_size’ is not defined

"

@vang37 Why are you posting your code in public?

If you want to share your code, click my name and message your notebook as an attachment.
Don’t forget to remove code from your recent post.