Natural Language Processing Specialization: Course 3(Sequence Models) Week 1 Assignment Code Issue

In the create_batch_dataset() function, my solution is not passing the unit test case 1. Though it is showing right solution and matching the expected output given in the example, while running the Unit Test 1, the first batch length is becoming 0 and the dataset is not returning anything. It might be a problem related to the sequence length or the Batch_size, but I am not sure what it is actually.
I am stuck at this point of the Course and hence can’t move ahead due the blocker. Any help will be highly appreciated!!
{Assignment link removed by moderator - AGAINST COMMUNITY GUIDELINES}

Kindly share images of your error and output only.
**Do not share codes which will be graded or notebook assignments ** on public post thread. It is against community guidelines.

Regards
DP

Output:

Error:

Hello @Siddhartha_Dey

THINGS TO CHECK BASED ON YOUR ERROR:

Hoping you have same output as expected output for line_to_tensor grader cell and have passed w1_unittest.test_line_to_tensor(line_to_tensor) (MAKE SURE YOU HAVE APPLIED THE RIGHT DATA TYPE TO CODE LINE chars)

You can verify the above code with the cell mentioned under
1.3 - Convert a Line to Tensor
where it shows what type of datatype need to be used)

Next in the create_batch_dataset,

  1. Make sure the below code line is recalled correctly using the previous grader cell function recall line_to_tensor so that you convert the data(using single line data and vocabulary list)
    ###Convert your data into a tensor using the given vocab

  2. if point 1 is done correctly, the converted data tensor from point 1 is used to create a tensorflow dataset using tf.data.Dataset.from_tensor_slices

  3. if point 1 & 2 is done correctly, now you create a batch using the sequence length and drop_remainder. Remember importance pointers for this code line would be
    Make the dataset produce batches of data that will form a single sample each time. This is, make the dataset produce a sequence of seq_length + 1, rather than single numbers at each time. You can do it using the batch function of the already created dataset. You must specify the length of the produced sequences (seq_length + 1). So, the sequence length produced by the dataset will seq_length + 1. It must have that extra element since you will get the input and the output sequences out of the same element. drop_remainder=True will drop the sequences that do not have the required length. This could happen each time that the dataset reaches the end of the input sequence.

  4. If your 1, 2 & 3 points are correct, using data_generator.map write the code correctly by following the below instructions
    Use the split_input_target to split each element produced by the dataset into the mentioned input and output sequences.The input will have the first seq_length elements, and the output will have the last seq_length. So, after this step, the dataset generator will produce batches of pairs (input, output) sequences.

  5. Now the final step and important steps would to shuffle the dataset(REMEMBER IN THIS SECTION THE CODES WERE ALREADY GIVEN BUT SOME PART HAD TO BE WRITTEN BY THE CODER)
    So as you use the recalled function from the previous line, you also need to apply the buffer_size which you first shuffle and then reshuffle on each iteration, with the batch being defined by batch_size and drop_remainder as previously recalled while creating a batch dataset.

Let me know if you still have doubt and/or cleared your issue.

Regards
DP

1 Like

Thanks a lot for resolving the issue.

1 Like