W4: Error with tokenizing and aligning labels in Named Entity Recognition Lab

Hi all,

I am attempting to run this cell:

label_all_tokens = True
def tokenize_and_align_labels(tokenizer, examples, tags):
    tokenized_inputs = tokenizer(examples, truncation=True, is_split_into_words=False, padding='max_length', max_length=512)
    labels = []
    for i, label in enumerate(tags):
        word_ids = tokenized_inputs.word_ids(batch_index=i)
        previous_word_idx = None
        label_ids = []
        for word_idx in word_ids:
            # Special tokens have a word id that is None. We set the label to -100 so they are automatically
            # ignored in the loss function.
            if word_idx is None:
                label_ids.append(-100)
            # We set the label for the first token of each word.
            elif word_idx != previous_word_idx:
                label_ids.append(label[word_idx])
            # For the other tokens in a word, we set the label to either the current label or -100, depending on
            # the label_all_tokens flag.
            else:
                label_ids.append(label[word_idx] if label_all_tokens else -100)
            previous_word_idx = word_idx

        labels.append(label_ids)

    tokenized_inputs["labels"] = labels
    return tokenized_inputs

and I am getting an error as such:

TypeError                                 Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/util/structure.py in normalize_element(element)
     92       try:
---> 93         spec = type_spec_from_value(t, use_fallback=False)
     94       except TypeError:

/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/util/structure.py in type_spec_from_value(element, use_fallback)
    465   raise TypeError("Could not build a TypeSpec for %r with type %s" %
--> 466                   (element, type(element).__name__))
    467 

TypeError: Could not build a TypeSpec for [[101, 11113, 24158, 5369, 2243, 1046, 3270, 4646, 2458, 5482, 1011, 9669, 5397, 8191, 14129, 1010, 12092, 1011, 10373, 2033, 2006, 5262...

I am not sure why this is happening. Any help is appreciated!

Welcome to the community.

[[101, 11113, 24158, 5369, 2243, 1046, 3270, 4646, 2458, 5482, 1011, 9669, 5397, 8191, 14129, 1010, 12092, 1011, 10373, 2033, 2006, 5262...

Those are correct IDs that are tokenized in tokenizer() from the first “content” in df_data.

Are you using Coursera environment and encountering the same issue ?

@Mubsi,
This exercise has no input from a learner. And, a learner just invokes multiple cells.
We, mentors, have nothing to do with. Could you take a look from a Deeplearning.ai side ?

This is W4A2_UGL.

Hi @anon57530071 & @Ryan_Koo,

I tried running the notebook on my end and everything worked as it should.

@Ryan_Koo, did you change any of the code in the notebook ? In any case, fetch the latest version of the notebook. In the notebook, press the Help option on the top right corner, when the panel opens, press on the get latest lab button.

Let me know if you come across this issue still.

Best,
Mubsi

Yep, I don’t know what went wrong but refreshing the lab seemed to work.

Thanks!