C3w3-lab2: python and tuple of tensors, how..?

shahin · July 12, 2021, 1:25pm

Am trying & failing to fully understand how the following code works.

def format_image(image, label):

This simple function is defined as taking two arguments (and running it, shows that it receives tensorflow.python.framework.ops.Tensor tensors).

But the following line seems to pass train_examples, a tf.python.data.ops.dataset_ops.PrefetchDataset to format_image() (via the map() function) like so:

train_batches = train_examples.shuffle(num_examples // 4).map(format_image).batch(BATCH_SIZE).prefetch(1)

When I do:

for element in train_examples.as_numpy_iterator():
	print(type(element))
	break

I get <class 'tuple'>

I (very) tentatively concluded that the map() line is passing a tuple of tensors to the function (somehow …wrapped inside a tf.data.Dataset … ?), and not as two separate tensors.

But if this is the case why does Python allow this? The function is expecting two arguments, not a single tuple.

So, if I do (while commenting out the tf.image.resize() line):

format_image('confused', 1)

I get (as expected) no errors.

But if I do:

my_tuple = ('confused', 1)
format_image(my_tuple)

I get (as expected):

TypeError: format_image() missing 1 required positional argument: 'label'

So, why does train_examples.map(format_image) not also throw this TypeError ?

(Without being able to get such info from the documentation or even from print statements, I think these powerful libraries of Tensorflow are less accessible to newcomers, … I mean, beyond copy-pasting this stuff and seeing that it just works, without really knowing how, will probably cause problems somewhere down the line, imo).

mjsmid · July 12, 2021, 7:09pm

Hello Shahin,

In the beginning I also had many issues understanding the datastructures used in Tensorflow. It helps to take some time going through the Tensorflow documentation and playing with the notebooks. A great place to start for Datasets is the tf.data guide:

tf.data: Build TensorFlow input pipelines | TensorFlow Core

The following paragraph from the guide may help explaining your question above:
The tf.dataAPI introduces a tf.data.Datasetabstraction that represents a sequence of elements, in which each element consists of one or more components. For example, in an image pipeline, an element might be a single training example, with a pair of tensor components representing the image and its label.

There is another helpful link for understanding TFRecord files and how to link to the underlying raw data inside of it:

Good luck and happy learning!

Maarten

Topic		Replies	Views
C3W2 Assignment EXERCISE: shuffle and map the batches Data Pipelines with TensorFlow Data Services week-2	4	856	February 28, 2023
TFDS-V2-Week2 assignment Data Pipelines with TensorFlow Data Services week-2	0	432	February 26, 2024
Apparent conflicting instructions about C2W4 project Convolutional Neural Networks in TensorFlow week-4	3	319	December 18, 2023
C3W1_Assignment_Deep_N-grams_Exercise 2 - create_batch_dataset NLP with Sequence Models week-1	12	353	September 17, 2024
Course4 week4 train_step error Convolutional Neural Networks	5	845	May 22, 2021

C3w3-lab2: python and tuple of tensors, how..?

Related topics