C3W2 Assignment error oct-2024

Good morning,

I think there may be several typos in the Assigment C3W2 of the Natural Language Processing tensorflow course.

The first one is in this part:
codes removed as posting codes correct or incorrect on public post is against community guidelines

because on the dataset the order is first the label and second de text

The second one is in this part:

Create the label encoder

codes removed as posting codes is against community guidelines

Same reasoning as the other “bug” commented before

Third one:

Test your code!

unittests.test_preprocess_dataset(preprocess_dataset, vectorizer, label_encoder)

I got the model work perfectly, but this unittest continues giving me the following error:
InvalidArgumentError: {{function_node _wrapped__IteratorGetNext_output_types_2_device/job:localhost/replica:0/task:0/device:CPU:0}} assertion failed: [When num_oov_indices=0 all inputs should be in vocabulary, found OOV values ["tv future in the hands of viewers with home theatre systems plasma high-definition tvs and digital video recorders moving into the living room the way people watch tv will be radically different in five years time. that is according to an expert panel which gathered at the annual consumer electronics show in las vegas to discuss how these new technologies will impact one of our favourite pastimes. with the us leading the trend programmes and other content will be delivered to viewers via home networks through cable satellite telecoms companies and broadband service providers to front rooms and portable devices. one of the most talked-about technologies of ces has been digital and personal video recorders (dvr and pvr). these set-top boxes like the us s tivo and the uk s sky+ system allow people to record store play pause and forward wind tv programmes when they want. essentially the technology allows for much more personalised tv. they are also being built-in to high-definition tv sets which are big business in japan and the us but slower to take off in europe because of the lack of high-definition programming. not only can people forward wind through adverts they can also forget about abiding by network and channel schedules putting together their own a-la-carte entertainment. but some us networks and cable and satellite companies are worried about what it means for them in terms of advertising revenues as well as brand identity and viewer loyalty to channels. although the us leads in this technology at the moment it is also a concern that is being raised in europe particularly with the growing uptake of services like sky+. what happens here today we will see in nine months to a years time in the uk adam hume the bbc broadcast s futurologist told the bbc news website. for the likes of the bbc there are no issues of lost advertising revenue yet. it is a more pressing issue at the moment for commercial uk broadcasters but brand loyalty is important for everyone. we will be talking more about content brands rather than network brands said tim hanlon from brand communications firm starcom mediavest. the reality is that with broadband connections anybody can be the producer of content. he added: the challenge now is that it is hard to promote a programme with so much choice. what this means said stacey jolna senior vice president of tv guide tv group is that the way people find the content they want to watch has to be simplified for tv viewers. it means that networks in us terms or channels could take a leaf out of google s book and be the search engine of the future instead of the scheduler to help people find what they want to watch. this kind of channel model might work for the younger ipod generation which is used to taking control of their gadgets and what they play on them. but it might not suit everyone the panel recognised. older generations are more comfortable with familiar schedules and channel brands because they know what they are getting. they perhaps do not want so much of the choice put into their hands mr hanlon suggested. on the other end you have the kids just out of diapers who are pushing buttons already - everything is possible and available to them said mr hanlon. ultimately the consumer will tell the market they want. of the 50 000 new gadgets and technologies being showcased at ces many of them are about enhancing the tv-watching experience. high-definition tv sets are everywhere and many new models of lcd (liquid crystal display) tvs have been launched with dvr capability built into them instead of being external boxes. one such example launched at the show is humax s 26-inch lcd tv with an 80-hour tivo dvr and dvd recorder. one of the us s biggest satellite tv companies directtv has even launched its own branded dvr at the show with 100-hours of recording capability instant replay and a search function. the set can pause and rewind tv for up to 90 hours. and microsoft chief bill gates announced in his pre-show keynote speech a partnership with tivo called tivotogo which means people can play recorded programmes on windows pcs and mobile devices. all these reflect the increasing trend of freeing up multimedia so that people can watch what they want when they want."], consider setting num_oov_indices=1.]
[[{{node string_lookup_20_1/Assert/Assert}}]] [Op:IteratorGetNext] name:

I further check, and all the labels are on the label_enconder, so there is no need for a nun_oov_indices=1.

I can get pass all the parts of the assignment but this. ¿Could you check if it is correct?

Thanks in advance

There are two threads which help you debug your create vectorizer codes and the dataset codes.

Both codes written by you were incorrect hence one of the unittest catched the error. Remember passing a test cell or able to run a model doesn’t mean your codes are correct always.

Your error clearly states the vocabulary it is looking for doesn’t exist with the codes you wrote.

Let me know if you need further help.

Regards
DP

hi @Jose_Ignacio_Tejera

kindly get a fresh copy and restart from beginning, seems like you have don’t too much editing to your notebook.

Remember you are only suppose to write codes between assigned markers

###START AND END CODE HERE###

no editing needs to be done in between these markers too other than only writing the codes by following already given instructions

You have also hard-coded the path to your codes in preprocess data, so the confusion also increased more.

In the fit vectorizer cell, you missed to write the last code line where you get the vocabulary using the previously recalled vectorize.adapt function key

then you use the vectorizer from this fit_vectorizer code cell and label.encoder from fit_label_encoder cell to get the dataset where you assigned these both function(vectorizer and label_encoder to its respective text and labels) to get the right dataset.

Regards
DP

Thank you very much for your help

one last question: How can I get a fresh copy of the assignment?, When I open from course it always open the version i have juste modified

hi @Jose_Ignacio_Tejera

to get a fresh latest copy of assignment.

Click File ==>Open

select all file except the assignment file and then delete.

Next select the assignment file and rename it, so the updated version gets download.

If you do not want to save the old copy, you can delete the assignment file too. In case you deleted the assignment file too, you get a 404 error in your screen.

if you don’t delete and rename the file, you do not get 404 error.

At this time, click :question:on the right top corner, and then click reboot.

Once rebooting is done, click :question: again, now click Get the latest Version and then Update Lab.

You have the latest copy of assignment to workup on.

Regards
DP

Good morning,

I have started it again and it all worked perfectly. Thank you very much for everything.

The error came from the first exercise. And here it is true that I think it would be good if the person responsible for the notebook could clarify it.

Grade cell image removed as it contains codes that grade your assignment, it is violation of community guidelines as mentioned earliar, kindly refrain from posting codes on public post in future

In the attached image I have highlighted in yellow two arguments that can be entered in any order. The first time I did the exercise I put train_labels first and train_text second. I did this because just above, when reading an example from the dataset, the order in which it appears is labels first, and then text.

However there is no signal in the notebook that this cannot be the correct order. The second time I did the exercise, I put train_text as the first argument and train_labels as the second and everything worked perfectly.

thank you again for your help!

Best regards!

Hi @Jose_Ignacio_Tejera

I have checked your codes, your codes were incorrect for preprocess grade cell codes for dataset as you separately defined the label and text which was not required.

You only had to use the correct return functions to the labels and text in dataset as in the train_val_dataset mentioned.

You also missed a important code in the grade cell fit_vectorizer causing exclusion of the train_sentences
The vectorizer is used to fit tokenizer to the training_sentences in the next code line.( remember the training_sentence has been recalled as train_sentences as per the arguments given in the GRADE FUNCTION: fit_vectorizer.

The point here you are raising about train_val_dataset is not relevant to the error caused in your assignment as in the train_val_dataset, the training dataset is splits into text and labels, and you required to create the dataset same way but using the correct return function written by coder for grade cell fit_vectorizer and fit_label_encoder.

Also as mentioned earlier and yet you are posting grade cell codes on public post thread is violation of community guidelines. Please refer Code of Conduct and How to Use the Forum for better understanding of discourse community.

Regards
DP

Thanks again, and sorry I didn’t mean to post any code in a thread again. When I sent the message, I thought it was a private message. Sorry for the mistake

Regards