Handwritten images

I understood linear regression, with prices per square house.
I understood logistic regression with binary classification.

I would also like to understand how a neural network can determine a number in an image. The lab shows this neural network and these images

I’ve investigated a little and it says there must be a pre-processing to the images, black and white so you can evaluate the color intensity of the pixels.
But it’s not clear, at least to me, how the neural network with math functions can determine a curve in a number, or whether the number is 9 or 4.

It cannot be magic, even if it seems, there must be an explanation of how it works under the hood.

If you recommend me a document to read so I can understand it deeper, I will do it.
If you tell me this you will understand in Deep Learning course, that’s fine, I will do it.

I like to understand everything I do.


1 Like

Hi @gmazzaglia this is a great question that I spent a lot of time trying to figure out when I started. I will try to explain how this works

In this example you have a picture, each pixel has a number associated, the number per se is not important, the important part is that is a representation of the image, you convert the picture using a number, if you have black pixel it will have the same number as another black pixel, and a different number for white, the algorithm recognize the pattern, and it learn how discriminate between the different patterns of numbers, so is not magic is math.

Let’s see an example

You have the picture of the number 1

let’s say you have only six pixels (obviously is an exaggeration of the example)

Column 1 Column 2 Column 3
0.002 1.313 0.002
0.001 1.312 0.002

The algorithm will look at this pattern and it will predict that it is similar to other similar pattern that represents number one, that could look something like this

Column 1 Column 2 Column 3
0 1 0
0 1 0

Note that in this case the number of the matrix are representations of the intensity of the picture at different pixels, so if you have the number zero, the intensity will have 1 at the pixels where the writing is present.

Please let me know if this helps!

It’s recognizing the patterns of the numeric values of the pixels.

1 Like

Understood, thanks @pastorsoto. It’s more clear. Regarding the activation functions that you use to detect that pattern.
Were they selected based on try and failure? I mean you try what is the best combination of activation functions, or is there a pattern too? I mean those combined functions are better for images than others.


I prefer to use the word “experimentation”.

The only firm rule about activations is “don’t use ReLU at the output layer”.

Rules of thumb:

  • ReLU trains faster than sigmoid or tanh because it doesn’t need to compute very much (especially during backpropagation).
  • ReLU is inefficient (no output for negative inputs), so you have to use a lot more ReLU units than either sigmoid or tanh units.
  • sigmoid’s output range is limited to 0 to 1. So it maps well to logical levels.
  • tanh’s output range is -1 to +1. So it works nicely for real number values in the output layer.