Handwritten images

gmazzaglia · June 12, 2024, 11:25pm

Hello,
I understood linear regression, with prices per square house.
I understood logistic regression with binary classification.

I would also like to understand how a neural network can determine a number in an image. The lab shows this neural network and these images

I’ve investigated a little and it says there must be a pre-processing to the images, black and white so you can evaluate the color intensity of the pixels.
But it’s not clear, at least to me, how the neural network with math functions can determine a curve in a number, or whether the number is 9 or 4.

It cannot be magic, even if it seems, there must be an explanation of how it works under the hood.

If you recommend me a document to read so I can understand it deeper, I will do it.
If you tell me this you will understand in Deep Learning course, that’s fine, I will do it.

I like to understand everything I do.

Thanks.
Regards.
Gus

pastorsoto · June 13, 2024, 12:30am

Hi @gmazzaglia this is a great question that I spent a lot of time trying to figure out when I started. I will try to explain how this works

In this example you have a picture, each pixel has a number associated, the number per se is not important, the important part is that is a representation of the image, you convert the picture using a number, if you have black pixel it will have the same number as another black pixel, and a different number for white, the algorithm recognize the pattern, and it learn how discriminate between the different patterns of numbers, so is not magic is math.

Let’s see an example

You have the picture of the number 1

let’s say you have only six pixels (obviously is an exaggeration of the example)

Column 1	Column 2	Column 3
0.002	1.313	0.002
0.001	1.312	0.002

The algorithm will look at this pattern and it will predict that it is similar to other similar pattern that represents number one, that could look something like this

Column 1	Column 2	Column 3
0	1	0
0	1	0

Note that in this case the number of the matrix are representations of the intensity of the picture at different pixels, so if you have the number zero, the intensity will have 1 at the pixels where the writing is present.

Please let me know if this helps!

TMosh · June 13, 2024, 1:40am

It’s recognizing the patterns of the numeric values of the pixels.

gmazzaglia · June 13, 2024, 1:53am

Understood, thanks @pastorsoto. It’s more clear. Regarding the activation functions that you use to detect that pattern.
Were they selected based on try and failure? I mean you try what is the best combination of activation functions, or is there a pattern too? I mean those combined functions are better for images than others.

Thanks.
Gus

TMosh · June 13, 2024, 5:42pm

I prefer to use the word “experimentation”.

The only firm rule about activations is “don’t use ReLU at the output layer”.

Rules of thumb:

ReLU trains faster than sigmoid or tanh because it doesn’t need to compute very much (especially during backpropagation).
ReLU is inefficient (no output for negative inputs), so you have to use a lot more ReLU units than either sigmoid or tanh units.
sigmoid’s output range is limited to 0 to 1. So it maps well to logical levels.
tanh’s output range is -1 to +1. So it works nicely for real number values in the output layer.

Topic		Replies	Views
C2_W2_Assignment - How digit recognition really works Advanced Learning Algorithms week-2	14	612	April 10, 2023
W4_How do we know what was learned? Neural Networks and Deep Learning	2	580	February 5, 2023
Week 2, course1, A tree is a "cat", a sun is a "cat"! Neural Networks and Deep Learning	11	610	June 4, 2021
Still unclear about W after videos, before test Neural Networks and Deep Learning	5	554	June 30, 2021
In the `Example: Recognizing Images` video, how are the neurons looking for specific 'images'? Advanced Learning Algorithms week-1	4	495	December 29, 2022

Handwritten images

Related topics