Take(1) returns different image each time?

I simplified the given code:

    for image, _ in train_dataset.take(1):
        plt.figure(figsize=(3, 3))
        first_image = image[0]
        ax = plt.subplot()
        augmented_image = tf.expand_dims(first_image, 0)
        plt.imshow(augmented_image[0] / 255)
        plt.axis('off')

There are two things I would like to understand:

  1. Why do I get a different image each time I run this code?
  2. Why is expand_dims needed? And, If we need this dimension, how come we take [0] on the next line?

Hello Meir, thank you for your interesting question.

I will try to guide you to find the answers on your own.

  1. Check the TF source code, and if does not help, I would love to explain more.

  2. Expand dims returns a tensor with a length 1 axis inserted at the second index ‘axis’. If you want to know why it is needed, try to remove it, and see for yourself :wink:. And the [0] index is used because expand_dims returns a tensor with two elements.

Hope this helps, if not I would be happy to assist more.
Regards,

LinkedIn

  1. The code and the documentation suggest the opposite - that the same first elements of the dataset should be chosen each time. In fact, this is what happens when I try their example:
dataset = tf.data.Dataset.range(10)
dataset = dataset.take(3)
list(dataset.as_numpy_iterator())

Why would something different happen with the images dataset?

  1. I am still lost. Could you please give a concrete example with concrete values?

Hi,

My understanding is that in fact train_dataset inherits the functionalities of

image_dataset_from_directory(directory,
shuffle=True,
batch_size=BATCH_SIZE,
image_size=IMG_SIZE,
validation_split=0.2,
subset=‘training’,
seed=42)

I realized that if you set shuffle = False in

train_dataset = image_dataset_from_directory(directory,
shuffle=False,
batch_size=BATCH_SIZE,
image_size=IMG_SIZE,
validation_split=0.2,
subset=‘training’,
seed=42)

then the images are always the same, no matter how many times you run the image plotting cell. If you set shuffle=True the images are different at every run of the image plotting cell but the order in which they appear is the same. I guess, this means that the 9 images come from the first batch of size 32 if shuffle=False, because the data is shuffled in alphanumeric order. In contrast, if shuffle=True then the data is shuffled at every run of train_dataset.take(1) and we get different batches every time. Nevertheless, in both cases the set of training images is the same - for shuffle=True it is shuffled and for shuffle=False it is not.

Henrikh