Matrices x_tr and y_tr all have the shape of (60000, 28, 28), which means: There are 60000 images. Each image has 28 rows. Each row has 28 pixels, which represent an intensity from 0 to 255
The first line of code normalized the values, which I understand. I don’t get why the second line of code flattens each image into one single vector, instead of keeping the rows segmented. This results in 60000 images which have 784 pixels.
Why do we do this? The original shape of the matrices seems more organized to me. Can somebody please explain?
Flattening an image into a vector is a standard technique in machine learning.
Most of the tools we use expect each example to be a vector. This is mathematically a very efficient representation.
It’s not obvious why, but the method works perfectly well. It’s only your brain’s visual system that requires the pixels in each row be adjacent to each other. The ML learning algorithm doesn’t have this requirement, since it doesn’t have a 2D retina surface. ML is perfectly happy if adjacent pixels are mathematically distributed.
There are methods for handling images as 2D objects, such as convolutional neural networks. These are much more computationally expensive, so are only used in situations where the benefits are worth the cost.
Auto encoder aims to compress the input. Now, if we wanted to represent an image of 784 pixels with just 2 pixels, retaining multiple dimensions won’t work.
Hi @balaji.ambresh! I appreciate your answer. However, I don’t understand what you are trying to do with both modes. I don’t understand the differences in their architecture and functionality. Could you please elaborate a bit more, if possible?
model1 uses unflattened inputs i.e. each row is of shape (28, 28). Output for each row is of shape (28, 2). This doesn’t satify the need to represent the entire image by 2 pixels.
model2 uses flatenned inputs i.e. each row is of shape (784, ). Output of each row is of shape (2,). This means that the entire image is represented by 2 pixels which is what we want.
use model1.summary() and model2.summary() to look at the architecture and output shapes of both models.