I have a probable improvement for cat recognization

Hello everyone. I just finished course materials in Week 2. When learning about the first step of processing pictures, I had a question in mind. Since it has not been addressed later, I decide to put it here.

Recall when you have a square picture in the training set. Dr. Ng said we convert it to a column vector of length 256 * 256 *3. This is because for any colored pixel can be assigned a value in each of the red, blue, and green axis. However, is that necessary?

If I understand correctly, machine learning tries to solve problems like human beings. On the other hand, as a man not particularly strong in vision, I can decide if a black-n-white photo contains a cat or not, as easily as I do that for its colored version. (I am certain this is true in 99.9% cases.)

In addition, I can see the total computation complexity grows at least in the square of the vector length, as it gets large. So, if we are only dealing with the black-n-white picture, the vector length is just one third as long, and there are only one third as many weights w to be solved for optimization. In short, the computation amount in the later stages could be no greater than one ninth of the current amount.

We also need to consider the trade-off here. To begin with, converting a colored picture makes one additional layer of neuron, but I would guess the extra complexity here is at most linear, since we are mapping a 3-tuple to a 1-tuple on each pixel, under some certain rule.

The real problem should be the training effect. How many uncolored pictures do we need to feed computers to achieve a comparable precision in cat-detection? We are still likely to speed up if we need no more than ten times as many uncolored pictures.

Would this be an interesting finding? Please let me know your opinions. Thank you for your time.

These are interesting ideas! One minor correction: the images we are dealing with here are 64 x 64 x 3, not 256 x 256 x 3. It is a good point that the human eye can probably recognize cats just as well with a black and white image as with color. It would be a pretty easy experiment to run with the dataset here: implement a conversion to produce the same dataset but with grayscale images. Here’s a StackExchange article that gives some pretty straightforward methods to do that conversion. Then you can run the experiment: do the training with the grayscale images and compare both the cpu cost of the training and the resulting accuracy to what we get with the RGB images. If I had to guess, I would bet that your point about how easy it is with the human eye to see a cat in a black and white picture will translate to the Logistic Regression algorithm as well. Note that the conversion from RGB to grayscale is a one time cost and it is not that expensive.

If you end up running this experiment, it would be really interesting to know what results you get. You could do it here in Week 2 with Logistic Regression and then again in Week 4 with a real 4 layer Neural Network. It’s great to try experiments like this: you always learn something interesting in the process. Thanks for suggesting these ideas!

Thanks for your reply.

When I try to rewrite X_train to be
0.2989X_trainR + 0.5870X_trainG + 0.1140*X_trainB,
where X_trainR/G/B represent the top/middle/bottom third rows of the matrix X_train, so that I can compress the colored pictures, the system tells me some matrices have mismatching sizes.

As I examine the actual size of X_train, Python says (4,3).

Isn’t that strange? I mean, we are given that the first coordinate of the size of X_train should equal 64 squared times 3 (=12288), which can no way be 4. Also, we have at least 25 pictures in the training set, which is much more than 3.

Thanks in advance.

The code we are writing is (supposed to be) general in the sense that it can handle any number of features or samples, right? Not all the test cases in the notebook use the full sized data. If you’re just writing a “unit test” for some function, why bother with 12288 entries? It’s way easier to check your work with 12 entries.

So, is there any way for me to convert the RGB pictures in the Week 2 data to grayscale?

Alternatively, how can I convert .jpg files into vectors?

The logic for converting the RGB images into vectors is already given to you in the notebook. Did you read the “flatten” section early in the notebook? But note that you need to be able to index the 3 color values for each pixel. My recommendation is that you just write a 3 layer for loop like this:

foreach sample
   foreach vertical position from 0 to 63
     foreach horizontal position from 0 to 63
       generate the output grayscale value by indexing the channels 0, 1 and 2

At the end of that conversion, you’ll have a 4D array which is m x 64 x 64 x 1. Then unroll those into vectors using similar logic to that already given to “flatten” the RGB images.