Implementing L2 regularization for logistic regression

Dear all,
I’m currently trying to implement L2 regularization for logistic regression and get stuck at derivative of log likelihood. Below is the formula:

I wrote a small function to run it from scratch. With parameter ver = ‘ver1’, I used for loop, the weights get updated one by one and ouputed correct answer

However, with parameter ver = ‘ver2’, I used array slicing to update and it outputed wrongly. But the strange thing is that it still managed to get 50 out of 194 weights correct .

def logistic_regression_with_L2(X, y, initial_w, step_size, l2_term, max_iter=100,ver = 'ver1'):
  w = np.array( initial_w )
  for itr in range(max_iter):

    score = X.dot(w)
    prob = 1./(1. + np.exp(-score))
    indicator = (y == +1)
    errors = indicator - prob

    if ver == 'ver1':   #update weights one by one
      for j in range(w.shape[0]): 
        derivative = X[:,j].T.dot(errors)
        if j == 0: 
          w[0,] = w[0,] + step_size * derivative #in log likelihood, we want to maximize
        else:
          w[j,] = w[j,] + step_size * (derivative - 2 * l2_term * w[j,])

    elif ver == 'ver2':   #update weights using slicing
      derivative = X.T.dot(errors)
      w[0,] = w[0,] + step_size * derivative[0,]
      w[1:,] = w[1:,] + step_size * (derivative[1:,] - 2 * l2_term * w[1:,])

    return w

#input for above function
initial_w = np.zeros((194,1))
step_size = 5e-6
max_iter = 501
l2_term = 1e3
X_train.shape is (42361, 194)
y_train.shape is (42361, 1)

w_ver1 = logistic_regression_with_L2(X_train, y_train, initial_w, step_size, 1e3, max_iter,ver='ver1')
w_ver2 = logistic_regression_with_L2(X_train, y_train, initial_w, step_size, 1e3, max_iter,ver='ver2')

(In case you don’t know, we want to maximize log likelihood, so we want weights += stepsize * derivative, not weights -= stepsize * derivative)

If I wrote wrong function, then it should not have any weights correct after 500 iterations, right ?.
What do you think of ‘ver2’ ? . Please help me this. I would be so appriciated if someone can help me :slight_smile:

This is the similarity between 2 versions. version 2 still got 25% right !

When I try to minus these 2 weights array, I realized they have very small errors between numbers, my assumption is that numpy can give small errors when update element by element versus update elements through array slicing.

I have not found any documents about this, if you know it, please tell me, thanks