Vectorization for Multi Dimensional Input Matrix, Like Images

I am referencing the C2 W1 Assignment but it’s not related to the assignment and I have finished the assignment anyway.
So in C2 W1 assignment, there is a neural network that classifies handwritten digits and for the input the images are collapsed to induvidual pixels, i.e from 20 x 20 image to 400 x 1 matrix of pixels. Does this loose positional data of pixels?
And with input shape set to (400, ) the number of parameters are as follow

Model: "my_model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_3 (Dense)             (None, 25)                10025     
                                                                 
 dense_4 (Dense)             (None, 15)                390       
                                                                 
 dense_5 (Dense)             (None, 1)                 16        
                                                                 
=================================================================
Total params: 10,431
Trainable params: 10,431
Non-trainable params: 0

I am practicing coding all the concepts learned till now in week 1 from scratch to get better understanding and am using “mnist” datasets I found in tensorflow tutorials and it has input images of digits with 28 x 28 resolution. I wanted to check what will happen if I create a model with input layer size set to (28, 28) and this is the model summary I got

Model: "3_layer_model_Sigmoid"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 layer_1 (Dense)             (None, 28, 25)            725       
                                                                 
 layer_2 (Dense)             (None, 28, 15)            390       
                                                                 
 layer_3 (Dense)             (None, 28, 10)            160       
                                                                 
=================================================================
Total params: 1275 (4.98 KB)
Trainable params: 1275 (4.98 KB)
Non-trainable params: 0 (0.00 Byte)
____________________________________

Forgot to mention but all layers in both the model I mentioned have sigmoid as activation. So now I am having hard time understanding those output shapes for layers. What does vectorization with multi dimensional input data look like. Maybe if I understand vectorization for single linear regression unit with multi dimensional input features then I might get better understanding or am I totally off the track with this whole thing?

Edit: I think for images maybe I should flatten like in the TensorFlow tutorial but why does the dense layer accept multi dimensional inputs, where is this used?
Also, tf.keras.layers.Flatten() is just image processing/data processing layer right?, It comes under data manipulation rather than part of machine learning algorithm right? Also, can we and do we use these data manipulation layers inbetween dense layers? Sorry if the question getting too large. Will remove the question if that is the case.

Yes, it does.

But since all of the images are distorted into vectors in exactly the same way, the learning still works.

I’m curious what you mean exactly by “vectorization” in this context?

If you want to preserve the images as flat 2D matrices (the way your eye perceives them), instead of unrolling them into vectors, you can use a different NN technique called convolution.

It’s not covered in this course, but it the subject of Course 4 in the Deep Learning Specialization.

By vectorization, I mean turning loop of single line algebra calculation into one matrix multiplication. For example lets say input of (100, 5), 100 samples, 5 feature, and just linear regression algorithm(or sigmoid) , vectorization is instead of doing loops we do dot product of (w . x ) and add b.

so for single sample it is, f(x) = W(5, 1 ) * X(1 , 5) + b(1, ) or W(5, ) * X(5, ) + b(1, )

now if input is (100, 5, 6) then what would the shapes of the matrices in above expression be like.
is it f(x) = W(6, 5) * X(5, 6) + b(1, ),
But if that is the case then the output will be a matrix instead of single value. Also, if we are to do dot product then it will result in (6, 6) matrix which means more output values than input parameters.

I think what the tensorflow model did is
f(x) = W(1, 5) * X(5, 6) + b(1, )
That would give output matrix of (1, 6)
And if we are to consider there are 25 units then the output would be (1, 6, 25) which is kind of similar to what tensorflow did in the example I provided in my main post. Even then that shape does not makes sense.

I get the feeling that I am doing something wrong when I trying to send induvidual features in 2 dimension or more instead of single dimension but why does the tensorflow model allow it.

This is from the TensorFlow documentation for the Dense layer.

1 Like