I have a question in mind about how in computer vision, the neural network deals with images.
In fact it takes the Red, Green, and Blue values and feed them to the model. Now wouldn’t that mean that, the first pixel (x0) 's colors will be split apart when combined into one dimentional array ? in a way that for example the red value in index 0, the green in index 65, and the blue in index 129 ?
Because if so, then the model won’t be able to understand that in fact, it is the same pixel. wouldn’t make sense to work around it to enhance its capabilities ? because a human being does see them together (as if he is seing the hue value of the hsv).
This is an interesting point. You’re right that in some cases, we need to preprocess the data in such a way that it seemingly destroys the geometric relationships between the pixels and the colors within each pixel, but as Tom says the networks are able to learn the patterns even when the images are “flattened” into vectors as we need to do with Fully Connected networks. Of course ConvNets are more powerful and can take images in their native format with the geometry still intact, even though (as you say) the colors of each pixel are contained in separate data items.
Also note that there are other image formats besides RGB: CMYK is another example or greyscale or in some formats like PNG there is a fourth channel called “alpha” that deals with transparency.