I’m having trouble interpreting these graphs accurately. Why does the blue class have only a single output, and why do the outputs for the green and orange classes appear to be positioned along the a1_1 and a1_0 axes?
Hello @abhilash341,
Just to make sure we are on the same page, could you please explain or interpret the meaning of the background colors in each of those 4 graphs? Why would the four backgrounds look differently?
Cheers,
Raymond
My interpretation is that, when
- (a1_1, a1_0) has values near 0 , the output will be classified as blue
- (a1_1 is higher, a1_0 is near 0) the output is classified as green
- (a1_1 is near 0, a1_0 is higher values) the output is classified as orange.
- (a1_1 is higher, a1_0 is higher) the out put is classified as purple.
my question why are we seeing only one blue dot?
is it because the actual 3D plot between (output, a1_1, a1_0) is illustrated here as a 2D plot between (a1_1, a1_0) and all the blue dots must have been overlapping?
another question that strikes my mind is that, how is the second plot relevant to the first plot show as
Thanks, @abhilash341, for sharing your interpretation!
We (you and I) will answer your question about the (0, 0), and indeed, I believe it is best that you can see and draft the answer yourself, and here we go! (We will not go into your other question for now, because I want one at a time)
First, we need to look at it from an angle of appreciating the fact that, the model has been trained so well that different classes get their very own places in the graph as indicated by your interpretation in your last reply.
I believe you have heard a few times before that, people say this kind of machine learning model as “discriminative”, meaning that a well-trained model is good at “discriminating” one class from the others.
Now, what you are seeing there is that, the model is trained so well that, through the projection by the hidden layer, the four classes can now find their very own places that are so well separable (discriminated) from one another. How can you see the projection result?
Go to the lab, after the 9th cell, then add, study and run the three cells of code below:
Make sure you understand what those three cells do.
Then, examine the printed results,
- can you explain why all class-0 samples are (0, 0)?
- would a poorly trained network gives you all class-0 samples as (0, 0)? You can verify that by retraining a new network with only 1 epoch (currently it uses 200 epochs), but remember that you need to re-run the code cell that instantiates the model before re-running the code cell that actually train the model.
- Does it HAVE TO have one of the four classes being squeezed at the corner of (0,0) for the model to well discriminate the four classes? Can you imagine another configuration that nobody needs to be squeezed there or anywhere?
Cheers,
Raymond
PS: feel free to attempt your other question about the other set of graphs, if you also think you got some idea about it after retraining a new model with 1 epoch, because you would have been studying the “projection result” by the hidden layer, which, afterall, is basically just what that other set of graphs are displaying.
Thank you Raymond for the detailed explanation, really appreciate it. I now understand that for all the dots were a1_0< 0 and a1_1 < 0 and that is why they were replaced with 0 by the ReLU function. Can you help me understand what is author conveying with the graph
are they just explaining where the points lie on the graph with different values for the a1_1 and a1_0?
Were you sharing a screenshot? If so, could you please just “paste” it in your reply or use the “Upload” button to attach it in your reply?
Because the link you shared requires passwords to login, but we can share screenshots here without that.
thank you for removing the link, I accidentally pasted the link instead of the actual image.
Hello @abhilash341,
Thanks for updating your post with the screenshot.
I want us to discuss that in a little bit different way. We have a long paragraph in the lab that explains the graphs, so what if you read it again, and tell me which part of it you are not sure of, and why are you not sure of it? Is it because you have a different understanding and what is that understanding?
You said “are they just explaining where the points lie on the graph with different values for the a1_1 and a1_0”. I consider it as your attempt to answer the question, and I would say YES, THEY ARE, but is that all that paragraph is delivering? Are we taking one whole paragraph to just tell us where the points are? Can you summarize more out of that paragraph?
Try go deeper and tell me more.
Cheers,
Raymond
I have given it another read and here is what I understood. Let me know if my interpretation is correct?
The output from the second layer is divided into four regions on the graph between a1_0 and a1_1 for classifying the inputs into four categories. The regions are as follows
- region near a1_0 close to 0 and a1_1 close to 0
- region near a1_0 > 0 and a1_1 close to 0
- region near a1_0 close to 0 and a1_1 > 0
- region near a1_0 > 0 and a1_1 > 0
all the inputs will be mapped to one of these regions. If we were to have 5 classifications possible, the division of regions on graph between a1_0 and a1_1 would have been different.
Hello @abhilash341,
Very true! I think the above is a very good deduction of what I am quoting from the lab in below:
One way to think of this is the first layer has created a new set of features
Certainly the new set of features is {a_0, a_1}. Should there be four classes, a well-trained model creates a set of new features that send samples to their own region, as exemplified by the lab. Should there be five classes, a well-trained model should create another set of new features that differentiate the five classes.
However, does it mean that a well-trained model can always find a corner for a class even when we have, say, 10000 classes? When the model reaches the limit, we may then need to think about whether it is sufficient to have only two new features, and maybe there will be time that we need to have three, ten, or even more features that suit the need for the problem in hand.
Cheers,
Raymond