Hello everyone. I just finished course materials in Week 2. When learning about the first step of processing pictures, I had a question in mind. Since it has not been addressed later, I decide to put it here.

Recall when you have a square picture in the training set. Dr. Ng said we convert it to a column vector of length 256 * 256 *3. This is because for any colored pixel can be assigned a value in each of the red, blue, and green axis. However, is that necessary?

If I understand correctly, machine learning tries to solve problems like human beings. On the other hand, as a man not particularly strong in vision, I can decide if a black-n-white photo contains a cat or not, as easily as I do that for its colored version. (I am certain this is true in 99.9% cases.)

In addition, I can see the total computation complexity grows at least in the square of the vector length, as it gets large. So, if we are only dealing with the black-n-white picture, the vector length is just one third as long, and there are only one third as many weights w to be solved for optimization. In short, the computation amount in the later stages could be no greater than one ninth of the current amount.

We also need to consider the trade-off here. To begin with, converting a colored picture makes one additional layer of neuron, but I would guess the extra complexity here is at most linear, since we are mapping a 3-tuple to a 1-tuple on each pixel, under some certain rule.

The real problem should be the training effect. How many uncolored pictures do we need to feed computers to achieve a comparable precision in cat-detection? We are still likely to speed up if we need no more than ten times as many uncolored pictures.

Would this be an interesting finding? Please let me know your opinions. Thank you for your time.