The following is code given by the instructor
dataset = tf.data.dataset.from_tensor_slices(series)
Window the data but only take those with the specified size
dataset = dataset.window(window_size + 1, shift=1, drop_remainder=True)
# Flatten the windows by putting its elements in a single batch
dataset = dataset.flat_map(lambda window: window.batch(window_size + 1))
# Create tuples with features and labels
dataset = dataset.map(lambda window: (window[:-1], window[-1]))
# Shuffle the windows
dataset = dataset.shuffle(shuffle_buffer)
# Create batches of windows
dataset = dataset.batch(batch_size).prefetch(1)
return dataset
The following is how I understand it, but I just want a comprehension check, because I feel like I am likely wrong
- the from_tensor_slices will put the array of data into a data object that tensorflow understands
- We then window the data. We add 1 to it to account for making the last element in our window the label. (at this point, I’m not sure what the structure of the data is. Assuming we had 100 datapoints, and our window is 10, do we now have a 10 x 10 matrix?)
- we then throw our windows into batch form, and flatten it. Again I’m not sure the structure of the data. would it now be 100x1?
- We now take the last element in each window, and turn it into the label for our window.
- shuffle them
- batch them again and fetch for processing?
I guess one of my questions is that it seems like we are doing the same thing a bunch of times. Can someone explain the intricacies of what each of those steps do, and how far I’m off?