When mapping the text_vectorizer to this dataset I do have two data inputs but the vectorizer expects only one? Do I have to split the _TensorSliceDataset again? If so how?
Conversely, the function label_encoder (last argument in the function preprocess_dataset) takes in two arguments:
Each of these arguments (train_labels, validation_labels) are about labels, one from the training set and one from the validation set. But when calling the preprocess_dataset function only one set is given:
This train_dataset contains only train text and train labels, so the function fit_label_encoder can only receive train labels – i.e. the preprocess_dataset function is lacking data for the validation_labels (it would also make no sense to have validation labels together with train labels…).
Could you please have a look at the function and give advice? Many Thanks!
preprocess_dataset is provided text_vectorizer and label_encoder adapted to the correct split(s) of the dataset.
Each entry in train_dataset is a tuple with structure (text, label).
To access a field of the tuple in each row, see this example: text_only_dataset = train_dataset.map(lambda text, label: text). Use this information apply the correct transformations and return a tuple of encoded text and encoded label.
With what other mentor mentioned, also refer the ungraded labs which will helps you on train_labels and validation_labels were used in fit_label_encoder, and why train_proc_dataset only used train_dataset.
Basically with grade function preprocess_dataset, you are creating a set of data with labels of text and labels from the fit label encoder as in that cells texts are vectorized and label encoded, so these function are normalization of data for better performance.
In the fit label encoder, you basically concatenate train labels with validation labels, then you encode the labels using these concatenated labels using [tf.keras.layers.StringLookup], making sure following the instruction to not include the oov_tokens as instruction in the grade cell mentions.
At last you fit the tokenizer to all the labels.
For the preprocess data, follow the steps what mentor has mentioned, there is also a test cell before the grade cell of preprocess data which could guide you but it is not a direct hint as in the preprocess dataset, you need to use label as set of text and label.
Hi, I’m also struggling on this section of the assignment. I have tried many variations of the commands I think are involved. This is the closed I have come. It gives the correct outcome for the immediately subsequent cell but then gives the wrong shape in the next cell.
posting grade cells codes is against community guidelines kindly refrain from posting codes, refer faq Code of Conduct
Please could someone help explain where I am going wrong?
You are not suppose to post codes on public post thread and whenever encounter any issue, kindly create a new post with a screenshot of the error you encountered without sharing any grade cell codes.
For better understanding, refer FAQ section Code of Conduct
Hi,
Apologies, this is the first time I have tried to do this and did not realise.
Unfortunately, I dont seem to be able to upload images either, I keep getting an error.
The issue i am having is that the shape of the batches are coming out as (32,) instead of (32,120).
can take a screenshot of the error you are mentioning @mallen, so I get better understanding where your codes might be going wrong.
If its a lengthy error log, you can take two separate screenshots and post. Also confirm if your previous grade cell unittest was passed fit label encoder, if not, then share the screenshot of the output you got when you run down that unittest cell.
Unfortunately I cannot seem to upload an image I keep getting errors and have tried on multiple browsers.
These are the errors I get at the unittests. All previous cells have passed all unittest cells successfully.
Failed test case: Got wrong data type for the preprocessed texts.
Expected: int64
Got: object
Failed test case: Got wrong data type for the preprocessed labels.
Expected: int64
Got: object
Failed test case: Got wrong shape for the preprocessed texts. Make sure that MAX_LENGTH is set to 120 before submitting.
Expected: (32, 120)
Got: (32,)
As you mentioned you have passed the previous unittest cell, I am only sharing solution for the grade cell you are currently having issue. In case it throws again any error after the correction than the previous grade cell codes need to be looked upon, let me know if that happens.
For now please refer the below comment(remember the max length is passed upon in the previous grade cell vectorizer code, that’s why I asked if your previous unittest cell passed or not) Max length code writing is not part of the preprocess data
you only need to use lambda: and then mentions text with its recalled function to text_,vectorizer and same for labels is label_encoder
if using this threw an error IOPub rate limit error that means your previous grade cells are incorrect.
You can DM me that code cell by personal DM and also send how you corrected the preprocess codes.
The reason I ask for screenshot of error or codes is not for my benefit as minor syntax error gets missed with copy paste and learners end up finding what is the issue which happened when I was addressing issue for another learner that he had missed ) in his codes.
here the labels and text were written separately in two dataset codes line but you need to write in one code line.
another difference here in this image is they have not use the recalled function from the fit encoder label which you need to use while writing the dataset codes
lambda text: mentions here first for text with it recalled function i.e. text_vectorizer and for labels with its recalled function i.e. label_encoder. You had used label instead labels.
This is direct hint, after this I have to directly give you the written code