C3W2 preprocess_dataset - dataset.map

Hello.

All the unit tests up to the “GRADED FUNCTION: preprocess_dataset” are passed successfully, but I do not seem to understand how to use either the map, the lambda function, or the dataset structure correctly. I have looked at other posts but have not found the answer.

As I understand the task:

We receive the dataset as an argument, which consists of training texts and training labels. Our task is to use previously created vectorizer and label encoders to process texts and labels respectively. The encoder and the vectorizer are already pre-trained by the time they are passed to this function, so we don’t need to train them.

After that we need to batch the dataset into batches of 32.

I am trying to follow the hint of using the .map method, using the following command:

    dataset = dataset.map(lambda a, b, : text_vectorizer(a), label_encoder(b))

Alternative text description in case the code is against the rules: a lambda function passed as an argument for the map method. Lambda function takes two arguments and applied text_vectorizer to the first and label_encoder to the second

However, I get the error “name ‘b’ is not defined“, meaning that I can not extract the labels that way.

Is one of my assumptions incorrect, or is this an incorrect way to handle the dataset?

hi @swat6296

this comment post mentions how to use dataset and map to process your data, please go through this, i also have provided how to correct the code, and if it’s matches how I mentioned to write the code and still you have got label is not defined, then one needs to check train_val_dataset codes.

let me know if you want me to review your codes still.

Regards

DP

After googling the lambda syntax I found the issue:
The section after the semicolon can be interpreted by python as a series of arguments, and to avoid this it has to be encased in brackets, otherwise lambda and everything after is interpreted as the first argument for the .map function and “label_encoder(b)“ is interpreted as a second argument, instead of the part of lambda.
I.e. a correct syntax for such a case would be:

result = dataset.map(lambda a, b :(func1(a), func2(b))

1 Like