W4: Error with tokenizing and aligning labels in Named Entity Recognition Lab

Ryan_Koo · July 13, 2022, 6:49pm

Hi all,

I am attempting to run this cell:

label_all_tokens = True
def tokenize_and_align_labels(tokenizer, examples, tags):
    tokenized_inputs = tokenizer(examples, truncation=True, is_split_into_words=False, padding='max_length', max_length=512)
    labels = []
    for i, label in enumerate(tags):
        word_ids = tokenized_inputs.word_ids(batch_index=i)
        previous_word_idx = None
        label_ids = []
        for word_idx in word_ids:
            # Special tokens have a word id that is None. We set the label to -100 so they are automatically
            # ignored in the loss function.
            if word_idx is None:
                label_ids.append(-100)
            # We set the label for the first token of each word.
            elif word_idx != previous_word_idx:
                label_ids.append(label[word_idx])
            # For the other tokens in a word, we set the label to either the current label or -100, depending on
            # the label_all_tokens flag.
            else:
                label_ids.append(label[word_idx] if label_all_tokens else -100)
            previous_word_idx = word_idx

        labels.append(label_ids)

    tokenized_inputs["labels"] = labels
    return tokenized_inputs

and I am getting an error as such:

TypeError                                 Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/util/structure.py in normalize_element(element)
     92       try:
---> 93         spec = type_spec_from_value(t, use_fallback=False)
     94       except TypeError:

/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/util/structure.py in type_spec_from_value(element, use_fallback)
    465   raise TypeError("Could not build a TypeSpec for %r with type %s" %
--> 466                   (element, type(element).__name__))
    467 

TypeError: Could not build a TypeSpec for [[101, 11113, 24158, 5369, 2243, 1046, 3270, 4646, 2458, 5482, 1011, 9669, 5397, 8191, 14129, 1010, 12092, 1011, 10373, 2033, 2006, 5262...

I am not sure why this is happening. Any help is appreciated!

anon57530071 · July 15, 2022, 8:38am

Welcome to the community.

[[101, 11113, 24158, 5369, 2243, 1046, 3270, 4646, 2458, 5482, 1011, 9669, 5397, 8191, 14129, 1010, 12092, 1011, 10373, 2033, 2006, 5262...

Those are correct IDs that are tokenized in tokenizer() from the first “content” in df_data.

Are you using Coursera environment and encountering the same issue ?

@Mubsi,
This exercise has no input from a learner. And, a learner just invokes multiple cells.
We, mentors, have nothing to do with. Could you take a look from a Deeplearning.ai side ?

This is W4A2_UGL.

Mubsi · July 15, 2022, 11:09am

Hi @anon57530071 & @Ryan_Koo,

I tried running the notebook on my end and everything worked as it should.

@Ryan_Koo, did you change any of the code in the notebook ? In any case, fetch the latest version of the notebook. In the notebook, press the Help option on the top right corner, when the panel opens, press on the get latest lab button.

Let me know if you come across this issue still.

Best,
Mubsi

Ryan_Koo · July 16, 2022, 6:51pm

Yep, I don’t know what went wrong but refreshing the lab seemed to work.

Thanks!

Topic		Replies	Views
C5 W4 lab2 What should I equate previous_word_idx to? Sequence Models coursera-platform	1	552	May 10, 2022
C5_w4 upgraded lab named-entity, errors in word identification Sequence Models coursera-platform	2	372	September 11, 2023
Tokenizer labels not give the proper week1 Natural Language Processing in TensorFlow week-module-1	1	538	January 15, 2023
Help with tokenize_labels Natural Language Processing in TensorFlow week-module-2 , week-module-3 , week-module-4	5	634	April 29, 2022
Tokenize_labels function Natural Language Processing in TensorFlow week-module-2 , week-module-3 , week-module-4	7	617	September 14, 2022

W4: Error with tokenizing and aligning labels in Named Entity Recognition Lab

Related topics