In the graded exercise of week 2, might we just transform the cat picture into a color histogram?

There is a lot of things to unpack in the Week 2 graded quiz :grin: but here is something that occurred to me:

Our way of generating an input vector X from a cat/noncat picture, which happens to be to “just align the pixels into a single column vector”, destroys all geometric relationship inherent in the input raster image. Basically, we are considering only the pixel colors originally in the image and how prevalent each color is when trying to decide whether we are in presence of cat or of noncat (or do we? on second thought, a “bright center” might indicate “cat” and thus the weights dealing with whatever pixels were previously “in the center” become important…)

Might we as well generate a histogram of pixel colors from the input raster image and train the logistic regression predictor on that?

One could imagine a vector X of 512 “color bins” (R,G,B each at 3 bits), each bin with a value between 0.0 and 1.0 depending on how many pixels with a color go into that “color bin” for the image under consideration, for example.

Reshaping a picture to 1D does destroy spatial information. You’re welcome to try any feature engineering approach that improves model performance.

Please wait till you get to course 4 which is all about vision related models.
The approach is quite different there since the input image is fed to the NN as 3D i.e. (height, width, color channels)

2 Likes