Cant manually calculate np.dot() for first feature vector in C1_W2_Lab02_Multiple_Variable_Soln

Hi,

Although I have finished the course I’m in the process of going back through a couple things and manually calculating them. I’m currently on the Multiple Variable Linear Regression, specifically the following equation:

def compute_cost(X, y, w, b): 
    """
    compute cost
    Args:
      X (ndarray (m,n)): Data, m examples with n features
      y (ndarray (m,)) : target values
      w (ndarray (n,)) : model parameters  
      b (scalar)       : model parameter
    Returns:
      cost (scalar): cost
    """
    
    m = X.shape[0]
    cost = 0.0
    for i in range(m):                                
        f_wb_i = np.dot(X[i], w) + b           #(n,)(n,) = scalar (see np.dot)
        cost = cost + (f_wb_i - y[i])**2       #scalar
        cost = cost / (2 * m)                      #scalar    
    return cost

If I set m = 1 and run the modified code:

    m=1
    i=0
    cost = 0.0                             
    f_wb_i = np.dot(X[i], w) + b           #(n,)(n,) = scalar (see np.dot)
    return f_wb_i

Then i get 459.99999761940825 but in excel i get 457.2311368

X = [2104, 5, 1, 45] Note I’m only showing the first feature vector rather than all 3.
y = [460, 232, 178]
w = [0.39, 18.75, -53.36, -26.42]
b = 785.1811367994083

Calculating WX
My understanding of the dot product was X0*W0 + X1*W1 + Xn*Wn

so i get:
820.56+93.75-53.36-1188.9 = -327.95

+b to get the model for when i = 0
=-327.95 + 785.18
= 457.23

I’m confused as to why np.dot() is getting a different value and where i have screwed up.

1 Like

That’s right, @Frazer_Mann! (I added the “…”)

Btw, it’s better to verify with integer values (except zeros and ones). Sometimes, even you print a floating point value, it does not guarantee to show you all decimal places, and if there is missing decimal place in your calculation, the result can be less precise.

Cheers,
Raymond

2 Likes

You posted in “MLS Resources”, but that’s not the best place for a question about a specific technical item in the course.

I’ve moved your thread to the correct location.

1 Like

Try your spreadsheet calculation using the full-resolution values for w and b.
You can find them in the 4th notebook cell:

Thanks very much.

ok, so i think its a rounding error.

I have manually set m to 1 so it only does the first feature vector and ive asked it to return

f_wb_i

the returned value is 459.9999976…

but if i declare the arrays as float rather than int64 (which is what it said they were when i queried the datatype, i get the same value as what i get in excel.

Im a little surprised a rounding error made such a big difference.

How do you get around the fact that numpy doesnt support decimal datatypes when you’re dealing with financial or engineering data?

Thanks again for your help.

This is not a fact. Numpy works perfectly well with floating point values.

I thought there was a difference between float and decimal datatypes? (You know alot more about this than i do so im guessing im mistaken)

Depends on what you mean by “decimal”.

Do you mean “integer”?

Integers are just simplified floating point numbers.

I’m use to C# where int, float, double and decimal were all distinct numerical data types and the calcs i was planning to do were going to require a large number of decimal places. Ive done some more googling and it appears as though there are only int and float data types in python and it seems like i should be using float 32 for when i finally get onto neural networks assuming i want to maintain as much precision as possible?

Thanks again for your time / correcting my earlier post. Hope you have a great weekend :slight_smile: