Could someone help me understand this logistic loss function graph from optional lab

This question relates to Optional Lab: Logistic Loss module in Week 3 of Supervised Machine Learning: Regression and Classification.

Could someone help me understand the last plot where it plots Logistic cost vs (w,b) and Log(Logistic cost) vs (w,b) is trying to show?

I understand we are plotting the cost function for the new logistic loss function. But what is the reason for using log (cost) on the right plot.

Along with it, what is the meaning of Logistic cost vs Log (Logistic cost)?

Hello @pujesh

Above is the first view we would see after running the code, and it is difficult for us to tell from the left plot more precisely where the minimum is. By logging the cost, a deep declining surface can be easily seen in the right plot. An apparent declining surface is what the right plot is trying to unveil.

Your screenshot seems to show that you had successfully rotated the left plot such that we could already see a declining surface, making the right plot not as necessary. That’s great!

Cheers,
Raymond

2 Likes

got it. thank you so much for the explanation.

just to confirm one more thing - does the left and the right plot holds the same value? meaning both of the plot refers to the same value but in order to see the depth of the plot clearly, we take a log of cost?

Hello @pujesh,

The (cost) values in the two plots are only differed by \log, and yes, we take the log to get a better visual.

To illustrate the first half of my response above, in the left plot below you can see that I drew a line starting from the boundary at w = -2.5 to the opposite side and then go upward along the “Cost” axis until I reached Cost = 2.5. That end point is the intersection point between the cost surface and the Cost axis at w = -2.5 (and b somewhere more than 2.5).

If we follow the same procedure but do it on the right plot, you will find that the same intersection point becomes somewhere right below 1 and that is because \log(2.5) = 0.92. This says that the intersection point didn’t get moved to a different position (in terms of w and b), only its height got “re-scaled”.

Lastly, when we are dealing with a plot that covers a very large range, it is actually a common skill to use the “log scale” to unveil pattern at small scale. Think about this: the right plot zooms into the range where log(Cost) = -5 to -1, what is this range in the Cost scale (without log)? The answer is 0.007 to 0.368. It’s what I mean by “small scale” when compared to the large range (0-20, in the left plot)

Cheers,
Raymond

1 Like

thank you so much for explaining all of this. this totally makes sense to me now.

1 Like

You are welcome, @pujesh!