I have just figured out that the input of each layer will be processed by logistic regression. The output of logistic regression represents the odds of belonging to a specific category. With this in mind, the threshold for making the final decision should be 1 instead of 0.5.

Moreover, for running the logistic regression we must have labeled data. in this regard, we should know the answer of each activation but we do not have the true y for them

It would be great if you could explain this in more detail for me.

No, when there are multiple labels, the final decision is to pick the one with the highest probability. No threshold is used in this case.

The threshold of 0.5 only applies if you have a true/false decision.

The labels are the ‘y’ values.

Hey Sina, I remember arriving at the same conclusion as well and things started to make more sense! Great step on your learning journey Turns out, while each neuron “could” be logistic regression, they are almost always more complex non-linear activation functions with various transformations. Ultimately it is difficult or impossible for the best trained data scientist to know “how” a feature was particularly engineered, since that is the role of the neural network to do for us. It finds the complex relationships we cannot, and adjusts itself to get the accurate output that we give it.

Lastly, what is so interesting about the final output layer being a logistic regression unit, is that while it will give us the boolean variable or true or false, yes or no, 0 or 1, it will also give us the probability of that outcome being certain. This is done, as you know, because logistic regression algorithms have a sigmoid function curve so we can know the degree to which it is likely one or the other. Hope this helps! I enjoyed your question!