C3W2 Assignment - Validation Set Labels

I am trying to create the labels for the training and validation set in C3W2 assignment (BBC news archive). I fit the tokenizer on all the labels then created label_seq as a sequence of split_labels. For some reason the first 5 labels for my training set come out properly but the first 5 labels of my validation set show [list([]) list([]) list([]) list([]) list([])]. Also, the shape is wrong for my validation set: they show (445,) instead of the correct shape. Any idea how to fix this?

Odds are good that you are invoking fit_on_sequences on the label tokenizer. Please use the correct method to fit the labels.

Hi, I met an issue when doing this part.
I fit the texts with all labels, and created sequences with split_labels, but got the following results, I can not figure out which part is wrong.

First 5 labels of the training set should look like this:
[[87]

  • [22]*
  • [40]*
  • [40]*
  • [74]]*

First 5 labels of the validation set should look like this:
[[25]

  • [26]*
  • [23]*
  • [14]*
  • [14]]*

Tokenized labels of the training set have shape: (1780, 1)

Tokenized labels of the validation set have shape: (445, 1)

I got the expected output in all previous parts, could anyone give me a hint? Thanks a lot!

I tried both fit_on_texts() and fit_on_sequences(), seems similar results.

Please click my name and message your notebook as an attachment in ipynb format.

In function tokenize_labels, you are not using label_tokenizer properly at fit_on_texts and texts_to_sequences (look for reference to global variable).

Thank you for your answer!
Do you mean the type of the labels should be numpy array? I tried to add this, but similar result. Is there any other reason? Thanks

Got it, found the reason! Thanks a lot!

Hi. I am having a similar issue as Florawang mentioned above. All previous functions return expected output, but tokenize_labels does not. Have tried a few different combinations but so far nothing helped.

First 5 labels of the training set should look like this:
[[87] [22] [40] [40] [74]]

Expected Output:
First 5 labels of the training set should look like this:
[[3] [1] [0] [0] [4]]

Not sure where to go next. Thanks.