In Lesson 1, Jupyter Notebook Cell 3, why are we flattening each image into one single vector?

Matrices x_tr and y_tr all have the shape of (60000, 28, 28), which means: There are 60000 images. Each image has 28 rows. Each row has 28 pixels, which represent an intensity from 0 to 255

So in Cell 3…

#Normalize and Reshape images (flatten)
x_tr, x_te = x_tr.astype('float32')/255., x_te.astype('float32')/255.
x_tr_flat, x_te_flat = x_tr.reshape(x_tr.shape[0], -1), x_te.reshape(x_te.shape[0], -1)

The first line of code normalized the values, which I understand. I don’t get why the second line of code flattens each image into one single vector, instead of keeping the rows segmented. This results in 60000 images which have 784 pixels.

Why do we do this? The original shape of the matrices seems more organized to me. Can somebody please explain?

1 Like

Flattening an image into a vector is a standard technique in machine learning.

  1. Most of the tools we use expect each example to be a vector. This is mathematically a very efficient representation.
  2. It’s not obvious why, but the method works perfectly well. It’s only your brain’s visual system that requires the pixels in each row be adjacent to each other. The ML learning algorithm doesn’t have this requirement, since it doesn’t have a 2D retina surface. ML is perfectly happy if adjacent pixels are mathematically distributed.
  3. There are methods for handling images as 2D objects, such as convolutional neural networks. These are much more computationally expensive, so are only used in situations where the benefits are worth the cost.

Auto encoder aims to compress the input. Now, if we wanted to represent an image of 784 pixels with just 2 pixels, retaining multiple dimensions won’t work.

import tensorflow as tf

(X_train, y_train), _ = tf.keras.datasets.mnist.load_data()
assert X_train.shape == (60000, 28, 28)
assert y_train.shape == (60000, )

# without flattening
model1 = tf.keras.Sequential([
    tf.keras.layers.Dense(16, input_shape = X_train.shape[1:]),
assert model1(X_train).shape == (60_000, 28, 2)

# with flattening
model2 = tf.keras.Sequential([
    tf.keras.layers.Dense(16, input_shape = [tf.multiply(*X_train.shape[1:])]),

assert model2(X_train.reshape(60_000, -1)).shape == (60_000, 2)

Hi @balaji.ambresh! I appreciate your answer. However, I don’t understand what you are trying to do with both modes. I don’t understand the differences in their architecture and functionality. Could you please elaborate a bit more, if possible?

1 Like

model1 uses unflattened inputs i.e. each row is of shape (28, 28). Output for each row is of shape (28, 2). This doesn’t satify the need to represent the entire image by 2 pixels.

model2 uses flatenned inputs i.e. each row is of shape (784, ). Output of each row is of shape (2,). This means that the entire image is represented by 2 pixels which is what we want.

use model1.summary() and model2.summary() to look at the architecture and output shapes of both models.