C3W2 Assignment - Validation Set Labels

ktalreja · June 20, 2023, 3:22am

I am trying to create the labels for the training and validation set in C3W2 assignment (BBC news archive). I fit the tokenizer on all the labels then created label_seq as a sequence of split_labels. For some reason the first 5 labels for my training set come out properly but the first 5 labels of my validation set show [list([]) list([]) list([]) list([]) list([])]. Also, the shape is wrong for my validation set: they show (445,) instead of the correct shape. Any idea how to fix this?

balaji.ambresh · June 20, 2023, 3:41am

Odds are good that you are invoking fit_on_sequences on the label tokenizer. Please use the correct method to fit the labels.

Florawang · February 20, 2024, 8:53am

Hi, I met an issue when doing this part.
I fit the texts with all labels, and created sequences with split_labels, but got the following results, I can not figure out which part is wrong.

First 5 labels of the training set should look like this:
[[87]

[22]*
[40]*
[40]*
[74]]*

First 5 labels of the validation set should look like this:
[[25]

[26]*
[23]*
[14]*
[14]]*

Tokenized labels of the training set have shape: (1780, 1)

Tokenized labels of the validation set have shape: (445, 1)

I got the expected output in all previous parts, could anyone give me a hint? Thanks a lot!

Florawang · February 20, 2024, 9:33am

I tried both fit_on_texts() and fit_on_sequences(), seems similar results.

balaji.ambresh · February 20, 2024, 5:14pm

Please click my name and message your notebook as an attachment in ipynb format.

balaji.ambresh · February 21, 2024, 5:11am

In function tokenize_labels, you are not using label_tokenizer properly at fit_on_texts and texts_to_sequences (look for reference to global variable).

Florawang · February 21, 2024, 7:29am

Thank you for your answer!
Do you mean the type of the labels should be numpy array? I tried to add this, but similar result. Is there any other reason? Thanks

Florawang · February 21, 2024, 8:13am

Got it, found the reason! Thanks a lot!

darkemoon · March 2, 2024, 9:44pm

Hi. I am having a similar issue as Florawang mentioned above. All previous functions return expected output, but tokenize_labels does not. Have tried a few different combinations but so far nothing helped.

First 5 labels of the training set should look like this:
[[87] [22] [40] [40] [74]]

Expected Output:
First 5 labels of the training set should look like this:
[[3] [1] [0] [0] [4]]

Not sure where to go next. Thanks.

FedericoChiodarelli · October 16, 2024, 3:18pm

Good afternoon. In the newer version of C3W2 assignment, I got an additional “None” label between the labels, when using StringLookup.
How do I remove that “None” label in position 0? What’s the reason for that? If someone could help me solving this, it would be great! Thank you.

balaji.ambresh · October 16, 2024, 4:10pm

Deepti_Prasad · October 16, 2024, 4:21pm

hi @FedericoChiodarelli

Also kindly try to create a new topic whenever you encounter any issue even if you find a similar thread, creating a new topic provides you a better archive to your as well as other learner’s learning journey and to avoid confusion for future learner seeking help.

Regards
DP

Topic		Replies	Views
TF1,C3,WK 2 Assignent re tokenize_labels Natural Language Processing in TensorFlow week-2 , week-3 , week-4	6	566	January 8, 2023
C3W2 tokenize_labels returns correct shape but wrong values Natural Language Processing in TensorFlow week-2 , week-3 , week-4	6	651	October 11, 2022
Tokenize_labels function Natural Language Processing in TensorFlow week-2 , week-3 , week-4	7	616	September 14, 2022
In C3W2_Assignment, is the data possibly broken? Natural Language Processing in TensorFlow week-2	4	18	February 18, 2025
C3W2_Assignment Error Natural Language Processing in TensorFlow week-2 , ai-discussions , project	5	61	January 5, 2025

C3W2 Assignment - Validation Set Labels

Related topics