C1_W3_Logistic_Regression_Potential problem

VladimirFokow · June 24, 2022, 1:45am

In public_tests.py in the compute_cost_reg_test(target) function definition there is the following code:

In the lectures Professor Andrew Ng specifically said:

“Since y takes only values 0 or 1, we can write the simplified formula for the cost function…”

Here, y takes values 0.5.
This might be a problem for people who have implemented their cost function using an approach like this:

-np.sum(np.log(f)[y==1]) - ...

Filtering by y equality with 1 and 0 completely ignores y=0.5 from the test case, which can result in frustration and a lack of understanding where you went wrong.

Furthermore, filtering by y equality with 1 and 0, can surprisingly be a better approach than a simple dot product
(or a for loop equivalently):

... - np.log(1-f) @ (1-y)

The reason behind this is that for some training case

z can happen to be a relatively high value, like 38,

and f == sigmoid(38) == 1.0 in python,

which leads to np.log(1-f) == np.inf.

And since np.inf * 0 == nan ,

this can result in np.log(1-f) @ (1-y) == nan

– and the whole sum and cost function will be nan - only because of one such training case!

This is undesirable if you want to track the cost function.

So that is why I think it is better to implement the cost function using filtering on y, and therefore all the test cases where y takes some values other than 0 or 1 should be removed from the course or rewritten.

rmwkwok · June 24, 2022, 2:12am

Hi @VladimirFokow, I think you do make a point here about the np.inf problem, and I think you might also be interested in how popular ML packages take care of that while keeping the loss formula the way it is most well-known today which is -y\log(f)-(1-y)\log(1-f).

In tensorflow, if you trace through the source code from this beginning, you will end up seeing these lines:

  output = tf.clip_by_value(output, epsilon_, 1. - epsilon_)
  return -tf.reduce_sum(target * tf.math.log(output), axis)

The first line will clip the values of f and 1-f such that they will both always remain between epsilon_ and 1-epsilon_, and epsilon_ is a very small number 1e-07, so it avoids the np.inf problem.

sklearn’s implementation has this line as well:

    y_pred = np.clip(y_pred, eps, 1 - eps)

and their eps = 1e-15.

So I believe it will be good if we stick with the formula the way it is, as it is indeed how popular implementation uses it, and just be consistent with the way we learn it.

I did the assignment as well and I didn’t encounter the np.inf problem using the formula so I think it had been designed to avoid the problem that you described.

VladimirFokow · June 24, 2022, 2:15am

Or I could just use [y>=0.5] and [y<0.5] instead of strong equality, I’ve just realized.

Thank you for advice and resources about np.inf !

rmwkwok · June 24, 2022, 2:23am

Not a problem @VladimirFokow

Indeed you pointed out that Professor Andrew Ng said y takes only 0 or 1, so I think the motivation behind your use of the [y>=0.5] is certainly justified, however, I think the purpose of the test is to hope to make sure we implemented the formula the way it is and I think this is also justified.

Even if your way help you pass the assignment, please also include the formula itself as a take-away, not just because this is how we introduce it, but also because this is how popular implementation uses it so you should expect to see that everywhere, and others may expect to see that from you too.

Actually, using the formula will make your code simpler as well - and you will see this if you go back to the tensorflow implementation, and it is also efficient.

Good luck

Topic		Replies	Views
Problem in Exercise 5 Week2 Neural Networks and Deep Learning coursera-platform	4	516	October 31, 2022
Cost function problem Neural Networks and Deep Learning coursera-platform	19	866	August 16, 2023
Compute_cost_reg_test function in public_tests.py (C1_W3_practice lab) Supervised ML: Regression and Classification week-module-3	2	552	July 31, 2022
C1_W3_Logistic_Regression UNQ_C2 : What's wrong with my implementation? Supervised ML: Regression and Classification week-module-3	9	232	May 6, 2024
Regularization Programming Assignment / Cost function Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	545	August 26, 2022

C1_W3_Logistic_Regression_Potential problem

Related topics