Lab2: Error message in function

I was running the second lab when this error occured in “def tokenize_function(example):” in the self made function:

TypeError: Provided function which is applied to all elements of table returns a dict of types [<class ‘list’>, <class ‘list’>, <class ‘torch.Tensor’>, <class ‘torch.Tensor’>]. When using batched=True, make sure provided function returns a dict of types like (<class 'list'>, <class 'numpy.ndarray'>, <class 'pandas.core.series.Series'>).

Seems that the tokenize_function(example) does not return the right data type, luckily I could jump over that by loading the pretrainied model…

It seems that there’s a problem with the outputs from the function. You may need to modify input parameters or check the return statement oftokenize_function to ensure it returns the correct data types. You can also check for documentation of the methods within to check if they’re returning the correct types.

If not, you can always restart the lab and get the original code, which should work if you have the correct versions of the libraries. If the error still persists, please let me know so we can raise a ticket and check if there’s a problem with the notebook used in the lab.

I fixed it by converting the PyTorch tensors to lists before returning them in the tokenize_function:

def tokenize_function(example):
    start_prompt = 'Summarize the following conversation.\n\n'
    end_prompt = '\n\nSummary: '
    prompt = [start_prompt + dialogue + end_prompt for dialogue in example["dialogue"]]
    
    input_ids_tensor = tokenizer(prompt, padding="max_length", truncation=True, return_tensors="pt").input_ids
    labels_tensor = tokenizer(example["summary"], padding="max_length", truncation=True, return_tensors="pt").input_ids

    # Convert the PyTorch tensors to lists
    example['input_ids'] = input_ids_tensor.numpy().tolist()
    example['labels'] = labels_tensor.numpy().tolist()
    return example