Hello,
I’m wondering how the model can determine the triangle area shown in the next picture when, based on the lectures, you are applying a linear decision boundary with logistic regression (sigmoid function) and a threshold: prediction value <= 0.5.
but, How can the model determine that the roasted bean process is good because the temperature is between 180 - 260 and the duration between 12 - 14, (triangle area)?
The image I’m asking for is located in the lab: C2_W1_Lab02_CoffeeRoasting_TF, section at the end: Layer Function.
My question is, How can the model determine if the roast process was correctly based on the triangle area when a linear decision boundary is applied with logistic regression?
I understand your question, i just asked for the notebook name because I only have access to the assignment’s repo, and not the lectures themselves. So I needed the notebook file name.
Thanks for answering. I will continue with the course because in this Lab there are only 2 layers, input and output, and not 3 hidden layers. I think it’s explained later.
The lab shows the weights for the three output units in the NN - they’re the W2 and b2 values
You can work out what those are doing via some pencil and paper work.
If you haven’t worked with an NN before, then maybe revisit this topic later. This lab is just a “gee whiz!” example of what an NN can do. It doesn’t tell you how - that comes later.
@gmazzaglia I don’t want to go out on a limb here… but I think the key insight is with regression, we are strictly dealing with two dimensions, so as you say, the problem has to be ‘linearly separable’–
However, once we start going to hidden layers in a more a NN kind of way, we are jumping to higher dimensions, even if in the end we map just back to that 2D space.
We’re still thinking about all the same 2D points we started with, but we are adding ‘depth’ to see how they might actually be related to one another.
Hard to explain because our minds don’t tend to work well in more than three dimensions, but that is how I understand it.
Or we know there is a function for it because we can, ourselves, easily get there just by drawing a triangle. But in programming, or NNs we look at the points, not just as ‘2D’, but now in a hyperdimensional space and that is how we arrive at the shape of that equation.
Note: I would hardly say what neural nets exactly are doing, is resulting in a Fourier Transform… But to just jog your mind in a completely different way, there is a function for almost anything…
Because we have 3 neurons in the 1st layer, we can draw 3 linear boundaries in the input space (which is a 2D space because input has 2 features).
When the NN is in its randomly initialized stage, the 3 boundaries are random, in other words, they don’t form that triangle.
As we train the NN, the w’s and b’s in each neuron keeps changing to minimize the loss. As the w’s and b’s change, the boundaries move!
Minimized loss results in that triangle.
This triangle is for our visualization only. The NN does not have this vision. Instead, the NN uses the 2nd layer to combine information from the 1st layer to make its final decision.
Continuing on 5, how does the 2nd layer’s neuron use the info? It has a trainable weight for each output of neurons in the 1st layer. These three trainable weights control how the aggregation is done. A good model will aggregate them in a way that, inside the triangle, the aggregated value is high, but low outside the triangle.