Please note that your notebooks are private to you, so no-one else can peer into the code and debug it for you. This is not a bug in the backend: it is a bug in your code. You just haven’t found it yet. The cause of dimension mismatches like this is almost always a problem with referencing global variables instead of the local variables you should be referencing. Put in print statements to check the shape of w (or use %debug) after the return from initialize, after the return from optimize and right before the call to predict. Does it have shape 4 x 1 in all those places? One common error is to store the return values of optimize in a different dictionary (params or parameters) than the one you use to retrieve the value of w before the call to predict.
Actually you can see that your dw value is already the wrong size (2 x 1), so the bug must happen either in optimize or before the call to optimize. The shape of the gradient of an object should always be the same as the object (dw and w in this case).
One other note: as you discovered, the “deepcopy” calls are not the problem. Those are there to protect against global references, but those only matter if you use “in place” operators for your “update parameters” step. It’s a pretty subtle point, but worth understanding. Here’s a thread that goes through what is happening there.