Hello @Siddhartha_Dey
THINGS TO CHECK BASED ON YOUR ERROR:
Hoping you have same output as expected output for line_to_tensor grader cell and have passed w1_unittest.test_line_to_tensor(line_to_tensor) (MAKE SURE YOU HAVE APPLIED THE RIGHT DATA TYPE TO CODE LINE chars)
You can verify the above code with the cell mentioned under
1.3 - Convert a Line to Tensor
where it shows what type of datatype need to be used)
Next in the create_batch_dataset,
-
Make sure the below code line is recalled correctly using the previous grader cell function recall line_to_tensor so that you convert the data(using single line data and vocabulary list)
###Convert your data into a tensor using the given vocab -
if point 1 is done correctly, the converted data tensor from point 1 is used to create a tensorflow dataset using tf.data.Dataset.from_tensor_slices
-
if point 1 & 2 is done correctly, now you create a batch using the sequence length and drop_remainder. Remember importance pointers for this code line would be
Make the dataset produce batches of data that will form a single sample each time. This is, make the dataset produce a sequence ofseq_length + 1
, rather than single numbers at each time. You can do it using thebatch
function of the already created dataset. You must specify the length of the produced sequences (seq_length + 1
). So, the sequence length produced by the dataset willseq_length + 1
. It must have that extra element since you will get the input and the output sequences out of the same element.drop_remainder=True
will drop the sequences that do not have the required length. This could happen each time that the dataset reaches the end of the input sequence. -
If your 1, 2 & 3 points are correct, using data_generator.map write the code correctly by following the below instructions
Use thesplit_input_target
to split each element produced by the dataset into the mentioned input and output sequences.The input will have the firstseq_length
elements, and the output will have the lastseq_length
. So, after this step, the dataset generator will produce batches of pairs (input, output) sequences. -
Now the final step and important steps would to shuffle the dataset(REMEMBER IN THIS SECTION THE CODES WERE ALREADY GIVEN BUT SOME PART HAD TO BE WRITTEN BY THE CODER)
So as you use the recalled function from the previous line, you also need to apply the buffer_size which you first shuffle and then reshuffle on each iteration, with the batch being defined by batch_size and drop_remainder as previously recalled while creating a batch dataset.
Let me know if you still have doubt and/or cleared your issue.
Regards
DP