The Model for this assignment

vang37 · July 28, 2022, 10:41pm

I am having trouble getting my model to run. Any help would be appreciated. Right now, my “MAXLEN” is un defined.

rmwkwok · July 29, 2022, 12:14am

Please don’t post your assignment. I am removing it.

Please search where MAXLEN is defined in the notebook, and make sure you have run that cell to define it.

vang37 · July 29, 2022, 12:16am

sorry about that. there’s no answers to be had in that assignment.

I followed you advice, I get “ValueError: Data cardinality is ambiguous:
x sizes: 144000
y sizes: 16000
Make sure all arrays contain the same number of samples.”

what can I do?

rmwkwok · July 29, 2022, 12:29am

Moved this thread to the TensorFlow Developer Professional category as the question should be about one of its assignment.

balaji.ambresh · July 29, 2022, 9:33am

@vang37

Here are some hints:

In train_val_split, don’t hardcode train_size. Instead, use the number of sentences and training_split to calculate train_size and the validation size.
In fit_tokenizer, don’t hardcode 160000. For this assignment, you should read up on Tokenizer to see what happens when you don’t provide num_words parameter.
In create_model, there is no max_length. See the function signature to pick the right value for input_length.

vang37 · August 1, 2022, 12:05am

When I don’t hardcode train_size, I get an error: “TypeError: can’t multiply sequence by non-int of type ‘float’”
(my code: train_size = int(sentences*training_split)).

vang37 · August 1, 2022, 12:11am

When I don’t hardcode fit_tokenizer, I get an error: “NameError: name ‘train_sentences’ is not defined”
(my code: tokenizer = Tokenizer(num_words = vocab_size, oov_token=‘’)
# Fit the tokenizer to the training sentences
tokenizer.fit_on_texts(train_sentences))

vang37 · August 1, 2022, 12:15am

as for your third point, please send me the function signature. Thanks!

vang37 · August 1, 2022, 12:37am

issue #1: resolved!!!

vang37 · August 1, 2022, 1:23am

this code: " # Instantiate the Tokenizer class, passing in the correct values for num_words and oov_token

tokenizer = Tokenizer(num_words=vocab_size, oov_token=oov_tok)

Fit the tokenizer to the training sentences

tokenizer.fit_on_texts(train_sentences)"

What has gone wrong here?

balaji.ambresh · August 1, 2022, 11:56am

Here’s the method signature you’re looking for:
def create_model(vocab_size, embedding_dim, maxlen, embeddings_matrix):

As far as the Tokenizer issue is concerned, what happens when you don’t specify the num_words parameter?

vang37 · August 2, 2022, 2:13am

[code removed - moderator]

"### START CODE HERE

# Instantiate the Tokenizer class, passing in the correct values for num_words and oov_token

NameError Traceback (most recent call last)
in
1 # Test your function
----> 2 tokenizer = fit_tokenizer(train_sentences, oov_token)
3 word_index = tokenizer.word_index
4 vocab_size = len(word_index)
5

NameError: name ‘oov_token’ is not defined"

My notes include in the tokenizer: num_words=vocab_size. I think this is what you’re referring to, but the error persists.

balaji.ambresh · August 2, 2022, 7:17am

The notebook starter code has this line:

tokenizer = fit_tokenizer(train_sentences, OOV_TOKEN)

Your notebook has this:

tokenizer = fit_tokenizer(train_sentences, oov_token)

Please refresh your workspace and try again.
See Refresh your Lab Workspace section here

On a related note, look at the signature: def fit_tokenizer(train_sentences, oov_token):. Inside the notebook, you are referring to an undefind variable oov_tok. Please fix that as well.

Moving forward, don’t post your notebook in public.

vang37 · August 2, 2022, 8:57am

after the changes you suggested, I still get: “NameError: name ‘vocab_size’ is not defined.”

here’s the relevant code: “def fit_tokenizer(train_sentences, OOV_TOKEN):
“””
Instantiates the Tokenizer class on the training sentences

Args:
    train_sentences (list of string): lower-cased sentences without stopwords to be used for training
    oov_token (string) - symbol for the out-of-vocabulary token

Returns:
    tokenizer (object): an instance of the Tokenizer class containing the word-index dictionary
"""
### START CODE HERE

# Instantiate the Tokenizer class, passing in the correct values for num_words and oov_token
tokenizer = Tokenizer(num_words=vocab_size, OOV_TOKEN=oov_tok)
# Fit the tokenizer to the training sentences
tokenizer.fit_on_texts(train_sentences)


### END CODE HERE

return tokenizer

Test your function

tokenizer = fit_tokenizer(train_sentences, OOV_TOKEN)
word_index = tokenizer.word_index
vocab_size = len(word_index)

print(f"Vocabulary contains {vocab_size} words\n")
print(“ token included in vocabulary” if “” in word_index else “ token NOT included in vocabulary”)
print(f"\nindex of word ‘i’ should be {word_index[‘i’]}")

NameError Traceback (most recent call last)
in
1 # Test your function
----> 2 tokenizer = fit_tokenizer(train_sentences, OOV_TOKEN)
3 word_index = tokenizer.word_index
4 vocab_size = len(word_index)
5

in fit_tokenizer(train_sentences, OOV_TOKEN)
13
14 # Instantiate the Tokenizer class, passing in the correct values for num_words and oov_token
—> 15 tokenizer = Tokenizer(num_words=vocab_size, OOV_TOKEN=oov_tok)
16 # Fit the tokenizer to the training sentences
17 tokenizer.fit_on_texts(train_sentences)

NameError: name ‘vocab_size’ is not defined

"

balaji.ambresh · August 2, 2022, 9:08am

@vang37 Why are you posting your code in public?

If you want to share your code, click my name and message your notebook as an attachment.
Don’t forget to remove code from your recent post.

Topic		Replies	Views
I am getting error in tokenizer Natural Language Processing in TensorFlow	12	312	January 17, 2023
There was a problem compiling the code from your notebook. Details: name 'sentences' is not defined Natural Language Processing in TensorFlow week-1	9	749	January 5, 2023
MaxLen error weekly assignment Natural Language Processing in TensorFlow week-4	4	348	April 9, 2022
C3W2_assignment (create a model) Natural Language Processing in TensorFlow week-2 , week-3 , week-4	2	596	July 24, 2022
C3W3 assignment seq_pad_and_trunc failing test Natural Language Processing in TensorFlow	5	376	April 25, 2022

The Model for this assignment

Fit the tokenizer to the training sentences

Test your function

Related topics