Week 2 Assignment Exercise 8 Problems with "w"

Hi, I’m trying to debug the following error for the past 1.5 hours, but it seems like there’s something happening in the backend that leads to an incorrect value of w being returned:

The error reads:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
in
----> 1 model_test(model)

~/work/release/W2A2/public_tests.py in model_test(target)
    109     y_test = np.array([1, 0, 1])
    110 
--> 111     d = target(X, Y, x_test, y_test, num_iterations=50, learning_rate=1e-4)
    112 
    113     assert type(d['costs']) == list, f"Wrong type for d['costs']. {type(d['costs'])} != list"

<ipython-input-36-b9a9ca57a444> in model(X_train, Y_train, X_test, Y_test, num_iterations, learning_rate, print_cost)
     42     b = params['b']
     43 
---> 44     Y_prediction_train = predict(w,b,X_train)
     45     Y_prediction_test = predict(w,b,X_test)
     46 

<ipython-input-16-b1ae5c93c959> in predict(w, b, X)
     16     m = X.shape[1]
     17     Y_prediction = np.zeros((1, m))
---> 18     w = w.reshape(X.shape[0], 1) # I tested this multiple times and w is returned correctly
     19 
     20     # Compute vector "A" predicting the probabilities of a cat being present in the picture

ValueError: cannot reshape array of size 2 into shape (4,1)

Using %debug, I was able to peek deeper into what was happening:

ipdb> params
{'w': array([[-0.08608643],
       [ 0.10971233]]), 'b': -0.1442742664803268}
ipdb> X.shape
(4, 3)

ipdb> print(initialize_with_zeros(X.shape[0]))
(array([[0.],
       [0.],
       [0.],
       [0.]]), 0.0)
ipdb> grads
{'dw': array([[0.12311093],
       [0.13629247]]), 'db': -0.14923915884638042}

The expectation is that X_train has dimension (4,3), so w should be a list of size 4. This is successfully created with my initialize_with_zeroes(), as you can see in the debug. However, whenever my model() is being run, initialize_with_zeroes() failed to return the right size - it’s stuck at 2. This leads to the incorrect number of parameters being calculated. This is why Python is complaining about not able to reshape a (2,1) vector to a (4,1) vector as required).

Now I have looked through my code a thousand times and it does not seem there’s a problem in my code. Everything ran fine until exercise 8. I notice that there are two unexplained lines in optimize():

#w = copy.deepcopy(w)
#b = copy.deepcopy(b)

I have commented them out since they assign w & b, but that does not solve the problem.

I suspect there’s something going on in the backend that keeps w stuck at 2. Or maybe I’m wrong, but either way can you look into this and see what’s going on? Thanks.

Please note that your notebooks are private to you, so no-one else can peer into the code and debug it for you. This is not a bug in the backend: it is a bug in your code. You just haven’t found it yet. The cause of dimension mismatches like this is almost always a problem with referencing global variables instead of the local variables you should be referencing. Put in print statements to check the shape of w (or use %debug) after the return from initialize, after the return from optimize and right before the call to predict. Does it have shape 4 x 1 in all those places? One common error is to store the return values of optimize in a different dictionary (params or parameters) than the one you use to retrieve the value of w before the call to predict.

Actually you can see that your dw value is already the wrong size (2 x 1), so the bug must happen either in optimize or before the call to optimize. The shape of the gradient of an object should always be the same as the object (dw and w in this case).

One other note: as you discovered, the “deepcopy” calls are not the problem. Those are there to protect against global references, but those only matter if you use “in place” operators for your “update parameters” step. It’s a pretty subtle point, but worth understanding. Here’s a thread that goes through what is happening there.

OK, thanks for this. I am still unable to resolve this however. Apparently trouble arises when optimize() was trying to return params:

debug, db = -0.16582694404431458
debug, dw :[[-0.38934561]
 [ 0.10118076]
 [-0.16607398]
 [-0.28602339]]
debug, db :-0.16580948911996324
debug, dw = [[-0.38934561]
 [ 0.10118076]
 [-0.16607398]
 [-0.28602339]]
debug, db = -0.16580948911996324
DEBUG: Params, pre return, in optimize:
{'w': array([[ 0.00194946],
       [-0.0005046 ],
       [ 0.00083111],
       [ 0.00143207]]), 'b': 0.0008311888169172717}
DEBUG: the entirety of params, post return, now in main
{'w': array([[-0.08608643],
       [ 0.10971233]]), 'b': -0.1442742664803268}

The return of optimize is defined as follows:

# Record the costs
if i % 100 == 0:
    costs.append(cost)

    # Print the cost every 100 training iterations
    if print_cost:
        print ("Cost after iteration %i: %f" %(i, cost))

params = {"w": w,
          "b": b}

grads = {"dw": dw,
         "db": db}

print("DEBUG: Params, pre return, in optimize:")
print(params)

return params, grads, costs

The relevant parts of Model() is defined as follows:

parmas, grads, costs = optimize(w,b,X_train,Y_train, num_iterations, learning_rate)
    
    print("DEBUG: the entirety of params, post return, now in main")
    print(params)
    
    w = params['w']
    b = params['b']

I don’t see anything wrong with how I returned params. In fact, that was not a part to be changed in the first place.

The problem is what I suggested earlier: what is the name of the actual output variable in which you store the return values of optimize for the parameters? Then where do you retrieve w from before the call to predict?

You need to be clear on the concept of variable scope. There’s nothing weird about the python model. The variable params down inside optimize is different than the variable params in the scope of the model function, right?

Interesting. I changed params to p when calling optimize and it now worked.
Can you explain a bit why I can’t use params when calling optimize?

You can. You just have to be consistent about it.

I understand the different between global and local variables from coding in other languages. Obviously, the params in optimize is not the same as params in the model() block. But the way I’m defining it originally was the following:

params, grads, costs = optimize(w,b,X_train,Y_train, num_iterations, learning_rate)

Which I understand is that I’m assigning the first returned item as params within model(). So theoretically it should work. Obviously this might be something different in Python that I’m misunderstanding, that’s why I’m trying to get clarifications.

No, this is not some weirdness or subtlety in python. Where does the w value come from that you pass to predict? You are retrieving it from a dictionary, right? Which dictionary?

It was from params , after the call from optimize():

 params, grads, costs = optimize(w,b,X_train,Y_train, num_iterations, learning_rate)
    
    print("DEBUG: the entirety of params, post return, now in main")
    print(params)
    
    w = params['w']
    b = params['b']
    

    print("DEBUG: w, from params['w'] post optimize :")
    print(w)
    
    Y_prediction_train = predict(w,b,X_train)

Ok, that looks correct although your indentation is a little sketchy. Indentation matters in python, right?

So what does that print statement show before the call to predict?

One other subtlety that might be interfering here: you do realize that just typing new code into a code cell doesn’t do anything, right? The new code doesn’t take effect until you actually execute that cell (“Shift-Enter”). Just calling the function again without doing that runs the previous version of the code.

If you want to be sure that “what you see is what you get”, first do: “Kernel → Restart and Clear Output” and then “Cell → Run All”.

Yes, I ran the new code.

As to your previous post:

DEBUG: the entirety of params, post return, now in main
{'w': array([[ 0.00194946],
       [-0.0005046 ],
       [ 0.00083111],
       [ 0.00143207]]), 'b': 0.0008311888169172717}
DEBUG: w, from params['w'] post optimize :
[[ 0.00194946]
 [-0.0005046 ]
 [ 0.00083111]
 [ 0.00143207]]
All tests passed!

And this time it was with everything called params. I think there might be a typo somewhere that I fixed it when I was renaming it to p.

But now it should work, and thank you for your help!

Whew! You had me worried there for a while. Glad to hear that it’s all straightened out now.

Yes, it was a stupid typo. You can see from my previous post:

parmas, grads, costs = optimize(w,b,X_train,Y_train, num_iterations, learning_rate)
    
    print("DEBUG: the entirety of params, post return, now in main")
    print(params)
    
    w = params['w']
    b = params['b']

The returned object is called parmas not params…

Thanks again and sorry for doubting on the backend.

1 Like

Ahhh, that would explain a lot. Sorry that I missed that. It’s really easy to overlook something like that, especially when there’s a global version of params so that it doesn’t just show up as an undefined reference later …