Dear all,
I’m currently trying to implement L2 regularization for logistic regression and get stuck at derivative of log likelihood. Below is the formula:
I wrote a small function to run it from scratch. With parameter ver = ‘ver1’, I used for loop, the weights get updated one by one and ouputed correct answer
However, with parameter ver = ‘ver2’, I used array slicing to update and it outputed wrongly. But the strange thing is that it still managed to get 50 out of 194 weights correct .
def logistic_regression_with_L2(X, y, initial_w, step_size, l2_term, max_iter=100,ver = 'ver1'):
w = np.array( initial_w )
for itr in range(max_iter):
score = X.dot(w)
prob = 1./(1. + np.exp(-score))
indicator = (y == +1)
errors = indicator - prob
if ver == 'ver1': #update weights one by one
for j in range(w.shape[0]):
derivative = X[:,j].T.dot(errors)
if j == 0:
w[0,] = w[0,] + step_size * derivative #in log likelihood, we want to maximize
else:
w[j,] = w[j,] + step_size * (derivative - 2 * l2_term * w[j,])
elif ver == 'ver2': #update weights using slicing
derivative = X.T.dot(errors)
w[0,] = w[0,] + step_size * derivative[0,]
w[1:,] = w[1:,] + step_size * (derivative[1:,] - 2 * l2_term * w[1:,])
return w
#input for above function
initial_w = np.zeros((194,1))
step_size = 5e-6
max_iter = 501
l2_term = 1e3
X_train.shape is (42361, 194)
y_train.shape is (42361, 1)
w_ver1 = logistic_regression_with_L2(X_train, y_train, initial_w, step_size, 1e3, max_iter,ver='ver1')
w_ver2 = logistic_regression_with_L2(X_train, y_train, initial_w, step_size, 1e3, max_iter,ver='ver2')
(In case you don’t know, we want to maximize log likelihood, so we want weights += stepsize * derivative, not weights -= stepsize * derivative)
If I wrote wrong function, then it should not have any weights correct after 500 iterations, right ?.
What do you think of ‘ver2’ ? . Please help me this. I would be so appriciated if someone can help me
This is the similarity between 2 versions. version 2 still got 25% right !