It’s a interesting point that whether you need colors or not is dependent on what your classification task is. If it’s recognizing shoes, then you’re probably right that greyscale images would suffice and save you some compute and storage costs. If the task is to recognize brown shoes or the difference between a red light and a green light, then it’s a different matter. Of course in reality, that red light vs green light thing may be a red herring (pun intended): you can probably recognize traffic lights by their orientation. But you get the point …
There are some interesting follow on questions and experiments one could do here. E.g. suppose we went back to the cat classification exercise with the 4 layer network in C1 W4 and then did it again by converting all the input training and test data to greyscale images, how would that affect the classification accuracy? Of course that data might not be such a good basis for that experiment given that the datasets are way too small to get a realistic picture. FWIW here’s a thread about some experiments tweaking those small datasets and the amount of perturbation you can see. If we really wanted to do valid experiments to see if colors help or not, we should find some datasets of a more realistic size.