I am trying to follow the notebook on this lab but for the cost function, I am trying to implement it fully vectorized rather then as shown in the lab using loops.
For some reason, when using a vectorized approach I am getting
[9.34734243e-12]
Cost at optimal w : 1.1684178034497506e-12
and what is output in the solved notebook
9.347342928021922e-12
Cost at optimal w : 1.5578904880036537e-12
Any one knows the reason of this difference
Hey @ftayyab,
Welcome to the community. This difference is due to the way that numpy functions work inherently, and due to the way they handle and treat numerical precision. Compare the 2 formulations below:
mul_pro = np.dot(X, w) + b
cost = np.sum((mul_pro - y)**2) / 2 * m
print(mul_pro, cost)
cost = 0.0
for i in range(m):
f_wb_i = np.dot(X[i], w) + b #(n,)(n,) = scalar (see np.dot)
print(f_wb_i)
cost = cost + (f_wb_i - y[i])**2 #scalar
cost = cost / (2 * m)
Both of them essentially implement the same formula. But when you run this code, you will observe differences between the output of np.dot()
in both the formulations.
In the first formulation,
mul_pro = [460. 232. 178.]
And in the second formulation,
f_wb_i = 459.99999761940825, 231.9999983694081, 177.99999898940814
As can be clearly seen np.dot()
gives different output for different sized vectors. In the first case, we are using, (3, 4) and (4, 1)
dimensional vectors, and in the second case, we are using (1, 4) and (4, 1)
dimensional vectors. I hope this helps.
Regards,
Elemento
1 Like
Thanks for the explaination. I had to set
np.set_printoptions(precision=14)
to get both to give same results.
1 Like
Why cant we not use vector operations using np.dot for cost and gradient descent steps? or if we can does the equation changes? np.dot behave differently based on the operands
Hey @ftayyab,
We can definitely use np.dot()
for vector operations in cost and gradient descent steps, and in fact, we will be using that only in the vectorized approach. The equation in both the cases remains the same. It’s just as you mentioned:
So, when we will be using the np.dot()
for vector operations, we will be defining the test cases using that too, so that the outputs for your implementation and for the test cases are the same, and the learners can pass the test cases. You are not motivated to think that the use of for
loop in this exercise suggests that we can’t use np.dot()
for vector operations.
In simple words, you can always use np.dot()
for vector operations, just not for this exercise. I hope this helps.
Regards,
Elemento
Hi Elemento,
Thanks for the response. I have been trying to use vectorized implementation for cost and gd steps however do not get similar results. I have also tried adding 1’s to the in input data and bias to the weight matrix, same as in the original ML course, however that too results in different outcomes. It would have been good to have full vectorized example rather then use of for loop for this execercise especially if that is the preferred apporach due to performance and is used in production.
1 Like
Hey @ftayyab,
I agree with your view point, but there is another view point, I guess you might be missing here, i.e., ease of understanding. For a newbie in the world of Machine Learning or in the world of programming in general, understanding these simple concepts can be quite daunting, and since this specialization has been created with the aim to break the entry barrier in ML for newbies, I guess starting off with as simple code as possible is a nice strategy.
Moreover, both of us know that the vectorized approach is not wrong in any manner whatsoever. It’s just that these assignments and optional labs are made to level up gradually just like a sigmoid function, and not in an instance, like the unit step function.
And I would strongly suggest you to implement the vectorized approaches by yourself for each of the functions, if that is something that you have fun doing in. You can always comment them out when running the test cases and submitting the assignments. Let me assure you that as you progress through the specialization, these assignments and optional labs will be levelling up accordingly. I hope this helps.
Regards,
Elemento
Just implemented LR using boston data (following a blog) as sample and was able to get same results. Not sure why I am not able to match the results for Lab02, adding bias and not using normalization.
def calculate_h(X, w):
h = X@w
return h
def calculate_cost(X, y, w ):
m = X.shape[0]
err = (calculate_h(X, w) - y) ** 2
return (1/(2*m)) * np.sum(err)
def gradient_descent(X, y, w, iters, alpha):
m = X.shape[0]
for i in range(iters):
err = calculate_h(X, w) - y
w = w - (alpha/m)*(X.T@err)
cost = calculate_cost(X, y, w)
print(f'Cost at current iteration: {i} is {cost}')
return w
Also normalized the input and add the bias to the input X. I think I am happy with the boston example as it clears the steps required for implementing vectorized version of LR
Hey @ftayyab,
I am glad to hear that you have found something that consolidates your understanding. Just to repeat myself once again, the issue lies with how np.dot()
handles different operands, and not with the dataset.
Regards,
Elemento
I am able to get the gradients to match using a vectorized version. The predictions with the vectorized version (at least for me) do not have the rounding issue, and therefore the cost function values are different. I agree with the wisdom of not including the fully vectorized version at this stage. Because I am a sadist, I also reshaped X so that it has shape (3, 4, 1), and y so it has shape (3, 1). To get the vectorized version to work, I had to define the dot product function and apply it along the first axis. I’m going to have to study this a lot more before I fully understand it 