C1_W3_Logistic_Regression_Potential problem

In public_tests.py in the compute_cost_reg_test(target) function definition there is the following code:

In the lectures Professor Andrew Ng specifically said:

“Since y takes only values 0 or 1, we can write the simplified formula for the cost function…”

Here, y takes values 0.5.
This might be a problem for people who have implemented their cost function using an approach like this:

-np.sum(np.log(f)[y==1]) - ...

Filtering by y equality with 1 and 0 completely ignores y=0.5 from the test case, which can result in frustration and a lack of understanding where you went wrong.

Furthermore, filtering by y equality with 1 and 0, can surprisingly be a better approach than a simple dot product
(or a for loop equivalently):

... - np.log(1-f) @ (1-y)

The reason behind this is that for some training case

z can happen to be a relatively high value, like 38,

and f == sigmoid(38) == 1.0 in python,

which leads to np.log(1-f) == np.inf.

And since np.inf * 0 == nan ,

this can result in np.log(1-f) @ (1-y) == nan

– and the whole sum and cost function will be nan - only because of one such training case!

This is undesirable if you want to track the cost function.

So that is why I think it is better to implement the cost function using filtering on y, and therefore all the test cases where y takes some values other than 0 or 1 should be removed from the course or rewritten.

Hi @VladimirFokow, I think you do make a point here about the np.inf problem, and I think you might also be interested in how popular ML packages take care of that while keeping the loss formula the way it is most well-known today which is -y\log(f)-(1-y)\log(1-f).

In tensorflow, if you trace through the source code from this beginning, you will end up seeing these lines:

  output = tf.clip_by_value(output, epsilon_, 1. - epsilon_)
  return -tf.reduce_sum(target * tf.math.log(output), axis)

The first line will clip the values of f and 1-f such that they will both always remain between epsilon_ and 1-epsilon_, and epsilon_ is a very small number 1e-07, so it avoids the np.inf problem.

sklearn’s implementation has this line as well:

    y_pred = np.clip(y_pred, eps, 1 - eps)

and their eps = 1e-15.

So I believe it will be good if we stick with the formula the way it is, as it is indeed how popular implementation uses it, and just be consistent with the way we learn it.

I did the assignment as well and I didn’t encounter the np.inf problem using the formula so I think it had been designed to avoid the problem that you described.

Or I could just use [y>=0.5] and [y<0.5] instead of strong equality, I’ve just realized. :grinning:

Thank you for advice and resources about np.inf !

Not a problem @VladimirFokow :slight_smile:

Indeed you pointed out that Professor Andrew Ng said y takes only 0 or 1, so I think the motivation behind your use of the [y>=0.5] is certainly justified, however, I think the purpose of the test is to hope to make sure we implemented the formula the way it is and I think this is also justified.

Even if your way help you pass the assignment, please also include the formula itself as a take-away, not just because this is how we introduce it, but also because this is how popular implementation uses it so you should expect to see that everywhere, and others may expect to see that from you too.

Actually, using the formula will make your code simpler as well - and you will see this if you go back to the tensorflow implementation, and it is also efficient.

Good luck :slight_smile:

1 Like