As the input training examples x^{(i)} are listed in rows of discrete values, one for each row, shouldn’t the plot of the evaluation of the loss function for logisitic regression also be a plot of discrete values of the evaluation of L
for each x^{(i)}, instead of a plot of a continuously-valued function against continously-valued input?
Hi @ai_is_cool
Can you walk us through the steps of your reasoning that support your conclusion.
Hi @Kic,
I should have said earlier in my previous post that the Loss function is plotted against w_j and b
, which do take on a limited set of values for each weight parameter w_j and bias b
update every iteration.
However, it is useful to see the “shape” of the loss function for ANY value of input.
Thanks.
You might do something like below to analytically find the shape. For the example below, 3 pairs of (w, J) can solve all coefficients.
Or we can analytically find the coefficients in terms of x^{(i)} and y^{(i)} and compute them, or we might interpolate. Certainly, linear interpolation has error because that only gives us a piecewise-linear.
Hi @ai_is_cool ,
If you are referring the cancer tumor size, age etc as the input training examples seen in the following screenshot, then please be aware this data set is for demonstration purposes, happened to be whole number. In the real world, the input training dataset can be represented as float datatype or integer for discrete whole number.
Let’s have a look at the loss function from this screenshot:
The loss for each data point (example) is not discrete. As the output of f(x) is a probability value, which is a float. Plotting a loss curve is by plotting the loss of each data point.
During training, the model goes through a number of iterations on the whole set of training data. So we use the cost curve instead of loss curve to give us some idea how the model is doing. If the cost is continuously reducing, then we know the model is on track to find the w and b such that the cost is at the minimum.