It is actually harder than imagined for a well trained network to predict exactly 0.500000000000000000000 It is not impossible, but it might take years before we see one such case.
So what you are saying is that small boundary condition has no material impact on the evaluation? The reason I wonder is for strong non-linear effects. And, thank you for taking the time to review and answer my question.
Just trying to understand you, how does what strong non-linear effects impact?
Up to now, I still think that, yes, it has no material impact on, actually, anything, because it almost does not happen, but I am looking for different viewpoints , so how does what strong non-linear effects impact?
By non-linear effect, maybe you are referring to the effect by those non-linear activation functions in the neural network? If so, how does it relate to the boundary of .5?
Non-linear effect would be anytime f(x)–> y is non-linear (worst is exponential). In physics and weather this is called a runaway function. In this case, f(x)–> is the result of the activation functions. Another way to look at it is recently the coin-toss was found to not be .5 (equal probability). There was a slight advantage to the side that is up when the coin is tossed. If some process is slightly biased from .5 but that effect gets multiplied through subsequent stages, the result may end up being greatly biased away from .5. That’s what I mean by non-linear.
That being said, not having taken your course, I am really too ignorant to answer your good questions relative to the non-linear activation functions. I should probably enroll!
It’s fine. I asked about the activation function because I was trying to come up with some possibilities of what you might be talking about, in case it could help move forward the discussion, but that would not have been necessary because now you have explained it
What I see from your explanation is, one process after another process, they bias the final outcome towards somewhere from .5, so it can start from .5, but the outcome is not .5 because of those processes.
In the case of a neural network, I believe it’s a little bit different - it does not start from .5 at the input layer, instead, it starts to be .5 at the outcome of the output layer.
I said it does not start from .5 at the input layer because the input layer is just a bunch of any numbers.
I said it starts being .5 at the outcome of the output layer, because we are optimizing the outcome to be away from 0 when the label is 1 and away from 1 when the label is 0.
The optimization requires only the outcome of the output layer to carry such property, and such zero-one deviation makes 0.5 a natural boundary. So, it starts being .5 at the outcome of the output layer, and no extra process thereafter to bias it.
Thank you. That’s a great explanation. If I understand it, the prior stages (I’m not thinking of just a two stage system) all have their probabilities, but the final stage is making the final label a binary 0 or 1.
However, if you want to suggest this way, I am all ears to hear how you develop this idea. Maybe we can start with what you mean by “a stage has its probability”?
Let’s say we are talking about a neural network. How would you connect the concept of “a stage has its probability” to a neural network? What is a stage in the context of a neural network? How does the stage be probabilistic?
I think you have me at a disadvantage. I’m used to probabilities in statistics and need to take your course to make any informed statements. But, I really appreciate your explanations! I look forward to your course.
I am sorry to have made thing look that way. I was interested in your idea and was wondering maybe you were leading us to Bayesian, or somewhere else.
I had thought about discussing this in other context (like weather forecasting), but frankly I would also worry that, even if the discussion turned out wonderful, would I be wasting time if we couldn’t just apply everything back to neural network.
Maybe we could have this conversation again when the time comes. Then, we may try to discuss whether, or better, just straight to how we can model neural network layers as probabilistic processes. I wait for that day to come.