Problems encountered in logistic regression

This is a source code file for manually implementing logistic regression. What I want to ask are the questions about gradient descent and the error calculation part inside.

Here, I added a division by the number of samples myself.

Originally, there wasn’t.

What I want to ask is whether batch gradient descent is used here? If so, why is the average not taken here but only the gradient accumulated?

And for this part, it regards samples with losses exceeding 0.5 as misclassified and calculates the error rate.

I don’t understand why those with losses exceeding 0.5 are classified wrongly.

Here is the source code file.
By the way, my English is not very good. Here is machine translation and there might be some ambiguity.
{mentor edit: removed the attached code link, since it relates to one of the graded assignments}

I don’t think you can post the exercise solution code like that, but:

This is definitely “batch gradient descent” as you take the whole batch (X,y), compute a gradient and then apply a gradient to the current parameters (as opposed to: cutting the batch into mini-batches and processing those or even processing the (X[i], y[i]) one by one)

As I see it, you may divide the accumulated gradient by the number of examples, but it’s just a scaling value. You may consider it to be already included in self-learning_rate.

Computing the gradient

Is the formula in gradient() for sample_gradient correct? It computes

(\hat{y} - y) * x

where \hat{y} is the prediction.

That doesn’t look like the right formula. (Update: I am wrong, it’s correct!)

There is also an unnecessary call to current_x = np.asarray(current_X) right before that.

Incrementing the error

The criterium whether to increment the error doesn’t look right to me. If the loss > 0.5 we can’t conclude much. The correct criterium would be (with log the natural logarithm)

loss > log(2) \approx 0.693147.

Because the decision table is:

This can be handled with an if/else or else:

With the prediction being:

\hat{y} = \sigma(params \cdot x)

and

loss(x) = - y * log(\hat{y}) - (1-y) * log(1-\hat{y})

In case of error with: y = 1 and \hat{y} < \frac{1}{2}

loss(x) = - log(\hat{y}) = -log(\frac{1}{2}*\mu) with \mu < 1
loss(x) = log(2) - log(\mu) with log(\mu) \in ]-\infty, 0[
loss(x) > log(2)

Similarly for y = 0 and \hat{y} > \frac{1}{2}

PS

You want to rename the method cost() to loss(). It is computing the loss of a single example after all.

PPS

In train()

para = initial_para       # Avoid modifying initial parameters

I don’t think that is going to work. para will just be another reference to initial_para, not a copy of initial_para

You want:

para = initial_para.copy()                  # Avoid modifying initial parameters

On the other hand, a new array is created anyway in gradient(), so it doesn’t even matter.

1 Like

First of all, I’m really very grateful that you are willing to answer my doubts. (I’m really excited that my first post was replied to. I’ve even just started learning machine learning not long ago.) Secondly, this piece of code was discovered by me when I was completing an assignment (the requirement of the assignment was to add comments to the code). After careful analysis, I found that there were some unreasonable aspects in this code. I tried to ask my teacher, but since my comprehension ability couldn’t keep up with the teacher’s, I only had a superficial understanding of many professional terms. I didn’t solve the doubts. Finally, after reading your answer, I was once again convinced that my intuition was correct. Your response not only made me find it simple and easy to understand, but also accurate and effective (confirming that my understanding is correct is really very important to me). I would like to express my gratitude to you again. By the way, this was also obtained through machine translation. I promise that I will definitely study English hard in the future. :face_holding_back_tears:

1 Like

In general, when doing your own experiments, you may do as you wish with the code.

But when you are working on a graded assignment, do not make any changes to the notebook except for adding your code where instructed.

Unexpected changes to a graded notebook can cause errors in the grader that are very difficult to debug.