Visualization Intermediate represntation

Hi Everyone:
Referring to code below. What does the line x = feature_map[0, :, : , i] do in the code? Surely it slicing operation but why it is required. I am having really difficulty in understanding this, can anyone put forth their view. Thanks

import numpy as np

import random

from tensorflow.keras.preprocessing.image import img_to_array, load_img

# Define a new Model that will take an image as input, and will output

# intermediate representations for all layers in the previous model after

# the first.

successive_outputs = [layer.output for layer in model.layers[1:]]

visualization_model = tf.keras.models.Model(inputs = model.input, outputs = successive_outputs)

# Prepare a random input image from the training set.

horse_img_files = [os.path.join(train_horse_dir, f) for f in train_horse_names]

human_img_files = [os.path.join(train_human_dir, f) for f in train_human_names]

img_path = random.choice(horse_img_files + human_img_files)

img = load_img(img_path, target_size=(300, 300))  # this is a PIL image

x = img_to_array(img)  # Numpy array with shape (300, 300, 3)

x = x.reshape((1,) + x.shape)  # Numpy array with shape (1, 300, 300, 3)

# Scale by 1/255

x /= 255

# Run the image through the network, thus obtaining all

# intermediate representations for this image.

successive_feature_maps = visualization_model.predict(x)

# These are the names of the layers, so you can have them as part of the plot

layer_names = [ for layer in model.layers[1:]]

# Display the representations

for layer_name, feature_map in zip(layer_names, successive_feature_maps):

  if len(feature_map.shape) == 4:

    # Just do this for the conv / maxpool layers, not the fully-connected layers

    n_features = feature_map.shape[-1]  # number of features in feature map

    # The feature map has shape (1, size, size, n_features)

    size = feature_map.shape[1]


    # Tile the images in this matrix

    display_grid = np.zeros((size, size * n_features))

    for i in range(n_features):

      x = feature_map[0, :, :, i]

      x -= x.mean()

      x /= x.std()

      x *= 64

      x += 128

      x = np.clip(x, 0, 255).astype('uint8')


      # Tile each filter into this big horizontal grid

      display_grid[:, i * size : (i + 1) * size] = x


    # Display the grid

    scale = 20. / n_features

    plt.figure(figsize=(scale * n_features, scale))



    plt.imshow(display_grid, aspect='auto', cmap='viridis')

That’s getting the individual “images” (really the output values) from the Conv2D and MaxPooling2D layers. The dimensions are [batch, height, width, channels]. Here the channels aren’t from RGB or a single grayscale channel, instead they’re equal to the number of kernels you have defined for that conv layer (which are either 16, 32, or 64).

The loop is going through each layer you’ve defined in the outer loop then in the inner loop it’s going through and getting the output values for each kernel (this is i being incremented). It then does some adjustments to the actual values and puts them into the already instantiated grid that makes up the plot which is ultimately what is being output at the bottom.

what this operation is called is this some kind of slicing in 3D or 2D or where can i read more about this kind of operation. If you can share the link?
Thank you!

This is indexing or slicing, though it’d be 4 dimensional in this case

For [0, :, :, i], we’re only running a single image through the model, so the layer output for that image is the first and only item in the batch, hence the 0 at index 0. Then, we want all the “pixels” of the image (really the values output from the conv and maxpool layers) which we get using : at index 1 and index 2. Finally we need to get these values for each kernel/filter defined in the conv layers which is why i is in at index 3. As we iterate we get the values produced by each filter one at a time.

This page on the TF site talks a bit more about shapes, slicing, and indexing for tensors: Introduction to Tensors  |  TensorFlow Core
But I think the best way to get intuition for this is actually messing around with some simple tensors and seeing what happens when you do different things.

Thanks a lot . . . …!!!

Thank you @asii_k for your reply, surely it will help others .

1 Like