TensorFlow input pipeline

Hi everyone,

I have been battling with this for days now and at the moment, ChatGpt cant help me and will appreciate any help out there.

So, I am building a tensorflow input pipeline for u_net and my current challenge is executing a data augmentation function I defined. So, before the data augment () func, I applied two other functios which I will describe below. I will use Alphabets to show my steps and where I am stuck;

(A) I created a dataset; dataset = tf.data.Dataset.from_tensor_slices ((img_paths, mask_paths)).

(B) I have preprocess func () that takes the same arguments (img_paths, mask_paths) and returns img, mask’; dataset = dataset.map(preprocess).

(C) using the output, I transformed the data further with another functn; dataset = dataset.map (img_resize). img_resize is defined to take img, mask images as arguments and return img, mask in a new size.

(D) Now, here is where my problem began; Note: augment () func is defined to take 4 arguments; (1) dataset, which is (img, mask) from resize () output, (2) seed, (3) si and (4) ir. All these arguments are created using tf.data.Dataset.from_tensor_slices (), e.g, tf.data.Dataset.from_tensor_slices (si), where si is a list of floating values.

After creating these arguments, I ran

[print(dataset.cardinality())
print(seed.cardinality())
print(si.cardinality())
print(ir.cardinality())], all returned the same result [tf.Tensor(24, shape=(), dtype=int64)], meaning there is no mismatch.

I also ran

[print(len(dataset))
print(len(seed))
print(len(si))
print(len(ir))] and got the same output; 24 from all. The augment () is designed to return img, mask as usual.

(E) At this point, I modified the dataset using zip () like this;

dataset_zip = tf.data.Dataset.zip ((dataset, seed, si, ir)) and tried to apply the augment () using map () as usual;

dataset = dataset_zip.map(augment) and here is the Error message most of the times; [OperatorNotAllowedInGraphError: in user code:

File "C:\\Users\\bildad\\AppData\\Local\\Temp\\ipykernel_5152\\381901545.py", line 44, in augment  \*
    seed1, seed2, seed3, seed4 = tf.random.experimental.stateless_split (seed, 4)

OperatorNotAllowedInGraphError: Iterating over a symbolic \`tf.Tensor\` is not allowed in Graph execution. Use Eager execution or decorate this function with @tf.function.\]. 

(F) I tried applying the augment () using another format [dataset = dataset_zip.map(
lambda dataset, seed, si, ir: augment(dataset, seed, si, ir))], but got the same error message.

I can’t figure out why the augment() function isn’t working. Your help will be highly appreciated.

Thank you in advance.

hi @Bildad

The error OperatorNotAllowedInGraphError: Iterating over a symbolic tf.Tensor is not allowed in Graph execution occurs because tf.data.Dataset.map() builds a TensorFlow computation graph. Inside this graph, you cannot use standard Python loops or list unpacking (like a, b, c, d = ...) on a tensor.

The stateless_split function returns a tensor, which cannot be unpacked using Python syntax within the map function

Why this happened is because when dataset.map(augment) runs, TensorFlow converts the Python function into a TensorFlow Graph. In this graph mode, variables are not actual Python objects with values, but “symbolic tensors” representing computations. Python cannot “iterate” over a symbolic tensor to unpack it.

Instead of using seed1, seed2, seed3, seed4 = tf.random.experimental.stateless_split(...), you should use TensorFlow operations that work on tensors within a graph.

here are two approaches you can try

  1. Unpack tensor into list of sensors using tf.unstack()
import tensorflow as tf

# Assuming 'seed' is passed in from dataset_zip
def augment(image, label, seed):
    # Instead of: seed1, seed2 = ...
    # Use tf.unstack
    seeds = tf.random.experimental.stateless_split(seed, 4)
    seed1, seed2, seed3, seed4 = tf.unstack(seeds) 
    
    # Example augmentation using the seeds
    # image = tf.image.stateless_random_flip_left_right(image, seed1)
    
    return image, label

# Apply mapping
dataset = dataset_zip.map(augment)
  1. if tf.unstack(). doesn’t work, try indexing
def augment(image, label, seed):
    seeds = tf.random.experimental.stateless_split(seed, 4)
    seed1 = seeds[0]
    seed2 = seeds[1]
    # ...
    return image, label

One thing you need to make sure while mapping, to avoid python list or tuple unpacking (a,b, c = tensor). Instead use tf.function like tf.unstack() or simple.indexing approach.
Also always remember all operations inside ‘map’ are Tensorflow ops i.e. tf.*

Hope this helps.

regards
Dr. Deepti

1 Like

Amazing. unpacking the seeds using tf.unstack() worked perfectly. Thanks a lot @Deepti_Prasad

I also have a challenge where plt.imshow() crashes my kernel everytime I try to run it. It restarts automatically whenever I run plt.imshow() after the TensorFlow pipeline, after putting together the dataset. I tried visualizing the the data after applying the augment () using ;

(A) for image, mask in dataset.take(1):
plt.imshow(image)

plt.show()

and

(B) for image, mask in dataset.take(1):
image = image.numpy()
plt.imshow(image)

plt.show()

but always crashes the kernel. What I did;

(i) Updated my matplotlib

(ii) ran both %inline and %notebook before and after importing matplotlib but to no avail.

What I used to verify my pipeline is TensorFlow;

(C) for image, mask in dataset.take(1):
tf.keras.preprocessing.image.array_to_img(mask).show(). This worked very well. However, the output is displayed outside the Jupyter notebook.

This is what I did next;

(D) I ran this code [##Matplotlib check on numpy data##

imgt = np.random.rand(224, 224, 3)
plt.imshow(imgt)
plt.imshow()]- Before my tensorflow pipeline and it worked - meaning the library is working well. However, when I ran the same numpy-based generated image within (random cell in the notebook) and after the Tensorflow pipeline, the kernel crashes and says it would restart automatically.

Have an idea why?

Glad it worked.

Apologies for your second query i cannot respond as I don’t know completely your tensorflow data pipeline and codes precisely.

As far I remember encountering something like this in tensorflow data and deployment course, and what I remember this can happen due to conflicting libraries or memory exhaustion.

Here is what you can do,

  1. Clear the out kernel output, then restart! kernel. by using the Jupyter Kernel Restart menu option. Re-Run the codes

  2. Check Library Version- TensorFlow and Matplotlib are updated to compatible versions in your environment. Chances are their version mismatch might be causing issues

3, and lastly if there data mismatch, If you pass a raw TensorFlow tensor directly to plt.imshow() , it may cause issues in some environment issue.

Here the fix would be explicitly convert the tensor to a NumPy array first.

plt.imshow(image.numpy()) # if it's a tensor
# OR
plt.imshow(np.array(image))

Regards
Dr. Deepti