Computer vision, RGB, why we use it that way?

I have a question in mind about how in computer vision, the neural network deals with images.

In fact it takes the Red, Green, and Blue values and feed them to the model. Now wouldn’t that mean that, the first pixel (x0) 's colors will be split apart when combined into one dimentional array ? in a way that for example the red value in index 0, the green in index 65, and the blue in index 129 ?

Because if so, then the model won’t be able to understand that in fact, it is the same pixel. wouldn’t make sense to work around it to enhance its capabilities ? because a human being does see them together (as if he is seing the hue value of the hsv).

NN’s work this way because that’s the native format of a color image. The image sensors create a separate output matrix for each color.

The human brain works out the relationship. So does a NN.

This is also how the human vision system works. Different cells in the retina respond to different colors.

It helps that the different colors for the same pixel are in the same relative position in each matrix.

1 Like

This is an interesting point. You’re right that in some cases, we need to preprocess the data in such a way that it seemingly destroys the geometric relationships between the pixels and the colors within each pixel, but as Tom says the networks are able to learn the patterns even when the images are “flattened” into vectors as we need to do with Fully Connected networks. Of course ConvNets are more powerful and can take images in their native format with the geometry still intact, even though (as you say) the colors of each pixel are contained in separate data items.

To go one level deeper here, have a look at this thread about the “flattening” of images for the Logistic Regression assignment in DLS C1 W2. If you read all the way through the thread, you’ll see a post later that makes the point that there is more than one way to do the “unrolling” of the images. It turns out that the network can learn the patterns with either orientation, but of course it requires that you be consistent in how you handle all your image data.

Also note that there are other image formats besides RGB: CMYK is another example or greyscale or in some formats like PNG there is a fourth channel called “alpha” that deals with transparency.

1 Like

I see, you’ve got a very good point.

Thank you