Need intuition for image vector and reshaping

hi there , I am Abhijeet Deshmukh , New member to deeplearning.AI community or probably a new student to AI :grinning_face_with_smiling_eyes:,
I have a very naïve question about the cat image classifier.
when we are taking an image of cat , How we are converting that image into matrix, (Note : I am asking how image is vectorized , I am asking How a physical quantity is converted into mathematical form) when we are breaking an image into grid of pixels, for each pixel we are filtering it into three layers and getting our matrix right? how we are doing that, How we are converting the colored pixel into matrix ,on what basis we are defining this matrix?
what is the basic definition of color and sketches in mathematics ? how human or animals understand image and how mathematics and science is actually understanding it ? what is similarity and difference between them ?

I know it’s naïve and nuisance question to ask here but still expecting some response :slightly_smiling_face:

in the end when we are learning our parameters to model , does these parameters actually finding the basic fundamental properties of pixel or these parameters are just taking shadow or nearly resembling property to colors ( when our parameter w is taking dot product with x(position of pixel) does it really matching the sense of what original sketch and color is?)

Here’s a thread which discusses this in some detail. Feel free to ask more followup questions here on this thread if the other thread does not address everything that you are asking about.

thanks @paulinpaloalto ,I got to learn difference between reshape(…,-1).T and reshape(-1,…) .
But what I am asking is not related to the question of week 2 assignment , I am asking something fundamental ,
let me explain it more elaborately , what we took is dataset of 209 samples with 64x64 size which means total 64^2 pixels per image , now consider 1 pixel out of 64^2 , so that one pixel will be having some Red blue and green color to it , right ? and we are extracting R as something 0xh0f0f ,B as 0xh0b0c and G as 0xhgc0f (some random values of red blue and green colors ) , so how it is happening ? how we are getting these 3 RBG values and forming 64x64x3 matrix… that’s my question.

and why I am asking this question is , when we are reshaping that matrix into ( 209x64x64x3,1) we are actually assigning RBG values labled by pixel position and image index (i.e. say at index 25 ,at pixel position (2,3) we have R=1 ,B=2 ,G=3… after reshaping we are storing RBG values (1,2,3) in column vector of index 25 and at 2+3=5th position i.e. 25x5x(3 ) from position 375 to 377 we are storing RBG values in our 209x64x64x3 vector or numpy array .

in the end what we are storing as array is values of red green and blue values per pixel per image and we are deciding the entire logistic regression output by checking whether our Red color at x pixel and at y image is matching with our test data’s Red color at same x pixel and at some image w . that’s it ,in the end we are comparing colors of image and based on our parameters we are getting yes or no as output .

can’t we compare with something else than RGB colors , how animals and humans distinguish pictures are they distinguish it using colors or something else ? this is my question

obviously it is not at all related to assignment but I am looking for something better solution to this problem rather than just reading colors.

Hi, each of element in (64x64) is a array [R,G,B] so it’s a box with width (64), height(64) and depth(3)

Yes, as @dinhngoc says, the pixel color values are supplied by the last “channel” dimension. For each of the 64 x 64 pixel positions we have three 8 bit unsigned integer values for the R, G and B color intensity values which are the 3 channels. We don’t have to “determine” these: they are given to us. They are produced by the camera that digitized the image. Note that there are lots of different image formats and not all of them use RGB, but it is one of the common image representations. If you want to know more about how image encoding works, try googling something like “what is a JPEG image”.

Yes, the point is that everything our algorithms can do is based on analyzing the color values of the pixels in order to learn to recognize things like primitive shapes e.g. edges and curves and then putting those together to recognize more complex structures like a cat’s ear or tail.

Well, I’m not a neuroscientist, so I don’t know anything about how a mammalian brain actually processes visual information, but from a pure physics standpont what is hitting the retina is light of various frequencies, right? That’s what “color” means, doesn’t it? Different frequencies of light create the sensation of different colors. We could use other forms of input of course, if we have the appropriate sensors: radio waves (e.g. from a radio telescope) or xrays or infrared rays. But when we feed those sorts of images to an algorithm, they have typically been converted into pixels of some form either colors or grayscale.

1 Like

Hi @paulinpaloalto. I just read the thread where you explain about the vectorization order, but when I reshape the images using order='F' I get nowhere near the accuracy I get when using the standard ‘C’ order for the test set. Shouldn’t the accuracy be the same?

@jalvespinto: Yes, it should be the same. Are you sure that you used the “F” order consistently on both the training and test data? That’s the point: as long as you are consistent in the way you do the flattening, the results should be equivalent. I have run this experiment using the cat image dataset in Week 2 and Week 4 of Course 1 and verified that I get the same accuracy results, so I am not just saying this as a theoretical statement: I have confirmed that it works.

Which exercise are you using when you don’t see equivalent results?

tks @paulinpaloalto. I was not changing the order for the test set :joy:

Great! I’m relieved to hear that there is an easy answer. :man_facepalming: