In Lesson 1, Jupyter Notebook Cell 3, why are we flattening each image into one single vector?

Govarthenan_Rajadura · November 15, 2023, 3:36pm

Matrices x_tr and y_tr all have the shape of (60000, 28, 28), which means: There are 60000 images. Each image has 28 rows. Each row has 28 pixels, which represent an intensity from 0 to 255

So in Cell 3…

#Normalize and Reshape images (flatten)
x_tr, x_te = x_tr.astype('float32')/255., x_te.astype('float32')/255.
x_tr_flat, x_te_flat = x_tr.reshape(x_tr.shape[0], -1), x_te.reshape(x_te.shape[0], -1)

The first line of code normalized the values, which I understand. I don’t get why the second line of code flattens each image into one single vector, instead of keeping the rows segmented. This results in 60000 images which have 784 pixels.

Why do we do this? The original shape of the matrices seems more organized to me. Can somebody please explain?

TMosh · November 15, 2023, 3:58pm

Flattening an image into a vector is a standard technique in machine learning.

Most of the tools we use expect each example to be a vector. This is mathematically a very efficient representation.
It’s not obvious why, but the method works perfectly well. It’s only your brain’s visual system that requires the pixels in each row be adjacent to each other. The ML learning algorithm doesn’t have this requirement, since it doesn’t have a 2D retina surface. ML is perfectly happy if adjacent pixels are mathematically distributed.
There are methods for handling images as 2D objects, such as convolutional neural networks. These are much more computationally expensive, so are only used in situations where the benefits are worth the cost.

balaji.ambresh · November 15, 2023, 4:01pm

Auto encoder aims to compress the input. Now, if we wanted to represent an image of 784 pixels with just 2 pixels, retaining multiple dimensions won’t work.

import tensorflow as tf

(X_train, y_train), _ = tf.keras.datasets.mnist.load_data()
assert X_train.shape == (60000, 28, 28)
assert y_train.shape == (60000, )

# without flattening
model1 = tf.keras.Sequential([
    tf.keras.layers.Dense(16, input_shape = X_train.shape[1:]),
    tf.keras.layers.Dense(2)
])
assert model1(X_train).shape == (60_000, 28, 2)

# with flattening
model2 = tf.keras.Sequential([
    tf.keras.layers.Dense(16, input_shape = [tf.multiply(*X_train.shape[1:])]),
    tf.keras.layers.Dense(2)
])


assert model2(X_train.reshape(60_000, -1)).shape == (60_000, 2)

Govarthenan_Rajadura · November 15, 2023, 5:21pm

Hi @balaji.ambresh! I appreciate your answer. However, I don’t understand what you are trying to do with both modes. I don’t understand the differences in their architecture and functionality. Could you please elaborate a bit more, if possible?

balaji.ambresh · November 15, 2023, 6:17pm

model1 uses unflattened inputs i.e. each row is of shape (28, 28). Output for each row is of shape (28, 2). This doesn’t satify the need to represent the entire image by 2 pixels.

model2 uses flatenned inputs i.e. each row is of shape (784, ). Output of each row is of shape (2,). This means that the entire image is represented by 2 pixels which is what we want.

use model1.summary() and model2.summary() to look at the architecture and output shapes of both models.

Topic		Replies	Views
Doubts with Practice Lab of week 2 Neural Networks and Deep Learning coursera-platform	3	781	July 18, 2022
Check for understanding: Week 2 Exercise 2 Neural Networks and Deep Learning coursera-platform	1	698	May 29, 2021
W2_A2_Ex-2_Flattening of the images Neural Networks and Deep Learning coursera-platform	20	1096	June 19, 2023
W2_How is dimension shaped through flattening of images? Neural Networks and Deep Learning week-2 , coursera-platform	5	409	February 1, 2024
W2_Quiz_Column Vector Image Neural Networks and Deep Learning coursera-platform	1	483	January 14, 2023

In Lesson 1, Jupyter Notebook Cell 3, why are we flattening each image into one single vector?

Related topics