Vectorization using np.sum()

khaldon · April 9, 2023, 9:21pm

I did test speed between two functions. first one compute_cost from already found in option labs and compute_cost2 this my function I have built. I use np.sum() to sum up the initial cost before computing final cost which multiply it by 1 / 2 * m. surprisingly I found the compute_cost is faster than compute_cost2. it suppose to be faster compute_cost2
first I tried a small set of data as same as big one and gave me that same result which compute_cost much faster thancompute_cost2

X_train  = np.random.randint(200, size=(10000000, 50))
y_train = np.random.randint(100, size=(10000000))
w_init = np.random.rand(50)

def compute_cost(X, y, w, b): 
    m = X.shape[0]
    cost = 0.0
    for i in range(m):                                
        f_wb_i = np.dot(X[i], w) + b           #(n,)(n,) = scalar (see np.dot)
        cost = cost + (f_wb_i - y[i])**2       #scalar
    cost = cost / (2 * m)                      #scalar    
    return cost

def compute_cost2(x, y, w , b): 
    m = x.shape[0]
    y_hat = np.zeros(m)
    cost  =  0 
    for i in range(m):
        y_hat[i]  = np.dot(x[i],w) + b 
    
    cost = np.sum((y_hat - y) ** 2)
    cost  = cost / (2 * m)
    return cost

benchmark test

import time 
tic = time.time()  # capture start time
cost2 = compute_cost2(X_train, y_train, w_init, b_init)
toc = time.time()  # capture end time

print(f" compute_cost2 function duration: {1000*(toc-tic):.4f} ms ")

tic = time.time()  # capture start time
cost1 = compute_cost(X_train, y_train, w_init, b_init)
toc = time.time()  # capture end time

print(f"compute_cost function duration: {1000*(toc-tic):.4f} ms ")

the result

compute_cost2 function duration: 80717.3033 ms 
compute_cost function duration: 78757.5171 ms

QookieMonster · April 9, 2023, 11:32pm

You are already looping over all examples, which is not necessary when using numpy. You can simply use np.dot(X, w) + b, which should speed things up.

I like to look at the shapes for matrix multiplications:
X has shape (10M, 50)
w has shape (50) (you could reshape this to (50,1)
Matrix dot product with (10M, 50) and (50, 1) results output shape (10M, 1)

You are forcing numpy to multiply (1, 50) and (50,1) 10 million times, which is inefficient.

The following shows the time difference for np.sum very well (10s vs 0.3s for me)

import time 
tic = time.time()  # capture start time
m = X_train.shape[0]
sum = 0
for i in range(m):
  sum += X_train[i]
print(sum)
toc = time.time()  # capture end time

print(f" For-loop sum duration: {1000*(toc-tic):.4f} ms ")

tic = time.time()  # capture start time
sum = np.sum(X_train, axis=0) 
print(sum)
toc = time.time()  # capture end time

print(f"Numpy sum duration: {1000*(toc-tic):.4f} ms ")

TMosh · April 10, 2023, 7:59am

Numpy is designed for efficient matrix algebra.
Avoid using for-loops if you want code to run efficiently.

Topic		Replies	Views
Is there better way to maximize the speed Supervised ML: Regression and Classification week-module-3	1	452	April 14, 2023
When Calculate Cost Function returning array instead scalar value NLP with Classification and Vector Spaces week-module-1	4	427	August 7, 2023
C3_W2_Collaborative_RecSys_Assignment - For loop vs np.sum Unsupervised Learning, Recommenders, Reinforcement week-module-2	3	600	September 20, 2022
Vectorized implementation of cost function Supervised ML: Regression and Classification	2	259	July 5, 2022
W4_A1_Ex-6_cost function with python programming Neural Networks and Deep Learning coursera-platform	3	631	January 6, 2023

Vectorization using np.sum()

Related topics