Lab2: Error message in function

Jesper156 · April 16, 2024, 11:15am

I was running the second lab when this error occured in “def tokenize_function(example):” in the self made function:

TypeError: Provided function which is applied to all elements of table returns a dict of types [<class ‘list’>, <class ‘list’>, <class ‘torch.Tensor’>, <class ‘torch.Tensor’>]. When using batched=True, make sure provided function returns a dict of types like (<class 'list'>, <class 'numpy.ndarray'>, <class 'pandas.core.series.Series'>).

Seems that the tokenize_function(example) does not return the right data type, luckily I could jump over that by loading the pretrainied model…

Charlie_DataScience · April 17, 2024, 7:03pm

It seems that there’s a problem with the outputs from the function. You may need to modify input parameters or check the return statement oftokenize_function to ensure it returns the correct data types. You can also check for documentation of the methods within to check if they’re returning the correct types.

If not, you can always restart the lab and get the original code, which should work if you have the correct versions of the libraries. If the error still persists, please let me know so we can raise a ticket and check if there’s a problem with the notebook used in the lab.

natalan · May 2, 2024, 2:08pm

I fixed it by converting the PyTorch tensors to lists before returning them in the tokenize_function:

def tokenize_function(example):
    start_prompt = 'Summarize the following conversation.\n\n'
    end_prompt = '\n\nSummary: '
    prompt = [start_prompt + dialogue + end_prompt for dialogue in example["dialogue"]]
    
    input_ids_tensor = tokenizer(prompt, padding="max_length", truncation=True, return_tensors="pt").input_ids
    labels_tensor = tokenizer(example["summary"], padding="max_length", truncation=True, return_tensors="pt").input_ids

    # Convert the PyTorch tensors to lists
    example['input_ids'] = input_ids_tensor.numpy().tolist()
    example['labels'] = labels_tensor.numpy().tolist()
    return example

Topic		Replies	Views
Tokenizer Error on batched=True When Using Different Cloud Service Generative AI with Large Language Models week-2	1	420	May 12, 2024
[SOLVED] Potential issue with tokenize_function in week2 lab Generative AI with Large Language Models week-2	1	132	May 26, 2024
Week 2 Lab: Train Error - Solved Generative AI with Large Language Models week-2	1	432	July 11, 2023
Data generator exercise NLP with Sequence Models week-2	2	508	January 21, 2023
NLP C3W2 Assignment - Error in unit tests for data_generator function NLP with Sequence Models week-2	14	652	May 8, 2022

Lab2: Error message in function

Related topics