Optimization Methods Notebook - Help!

I have been able to successfully create the functions within this notebook - the grader liked the and everything. However, there’s a problem. The functions that were imported initially do not seem to work! Bugs are always my fault no matter how convinced I am it’s the computer’s fault. But the thing is, I didn’t even make the function that’s causing issues - the notebook included it, and it’s causing a float division by zero. Therefore, I can’t see the accuracies of all the different types of optimization methods, because while my functions work, the ‘model’ function doesn’t work. ‘forward_propagation’ is causing a division by zero. Let me include an image:

Any suggestions? Thanks! Again, all the functions I made worked properly, but I can’t actually see the result like the notebook seems to think I should be able to.

The thing to realize is that a perfectly correct function can still throw errors if you pass it bad arguments. So the question is what did you do wrong in your code that causes the m value to be zero down in that function? You can examine the source for the included function by clicking “File → Open” and then opening the correct file. Then work backwards: what could have caused that error? Which of the arguments is wrong? Where did that come from in your code?

Note that m is the size of the minibatch, right? The number of samples. Hmmmmmm. :thinking:

Thank you for your response! Apologies on not trying to debug this sooner, but college is like that sometimes… What you said helped! I’ve identified the problem with your clue - the last mini batch that gets created is empty! What actually appears to happen is an extra mini batch is created after going through the entire training set.
But what I still don’t understand is why? Why is an extra empty mini batch being created? This messes up the backward_propagation function, due to one of the partial derivative formulas, as I saw in the source document with all the function definitions that you pointed me to (again, thanks). My function that generates random mini batches seems to be working perfectly, but I can’t think of any other culprit that could be producing an extra empty mini_batch (2 total - one for X and one for Y). It DIDN’T seem to be doing that in its initial creation, so I’m having a hard time of understanding how I should fix this.
Here’s a link to a screen recording with the issue:

I’ll also paste below my create_random_minibatches function:

{moderator edit - solution code removed}

Your random_mini_batches code is perfect, except for one little thing: it generates the incorrect last minibatch in the case that the minibatch size does not evenly divide the number of samples in the batch. Where did you think that was coming from if not from your code? It might work in the small test cases in the notebook, but it crashes and burns when you use it on real data.

I suggest that instead of the print statements that you have to debug at the end of the routine, try this print loop instead:

for i,mini_batch in enumerate(mini_batches):
    print(f"shape of minibatch X for {i} = {mini_batch[0].shape}")
    print(f"shape of minibatch Y for {i} = {mini_batch[1].shape}")

What does that show you?

Then the question is why is it wrong. The answer has to do with the fact that you misunderstand what the len() function does for a numpy array. Try this and watch what happens:

A = np.random.randn(27, 42)
print(f"len(A) = {len(A)}")

It just gives you the number of elements in the first dimension (dimension index 0). Is that helpful in the case you take len(shuffled_X)? BTW That also explains why taking len(shuffled_Y) doesn’t work.

Note that the code as written might work if the first dimension of X happens to be larger than the number of samples in the dataset, but that is just an accident. In python if you specify an index range that is off the end of the array, it just truncates at the actual length of the array.

Actually here’s a test case specifically designed to show your bug. Try this with your code and with my suggested print loop:

fooX = np.random.randn(4,130)
fooY = np.random.randn(1,130)
mini_batches = random_mini_batches(fooX, fooY, 64)

Here’s what I get when I run that test:

shape of minibatch X for 0 = (4, 64)
shape of minibatch Y for 0 = (1, 64)
shape of minibatch X for 1 = (4, 64)
shape of minibatch Y for 1 = (1, 64)
shape of minibatch X for 2 = (4, 2)
shape of minibatch Y for 2 = (1, 2)

That makes sense because 2 * 64 = 128, so the last partial minibatch should have 2 elements, right? What do you get when you run that? I predict the last minibatch will have size 0. Why is that?

Finally makes sense, thank you so much for all of your help. My code works now because I called the ‘.shape[1]’ method to get my hands on the actual NUMBER of input vectors in the last minibatch (plus the adding/subtracting, what have you…) If I had called ‘.shape[0]’, that would have accomplished the same thing as saying len(…), because that’s what the len function does with numpy arrays. The screenshot below reports my findings (with comments):
Again, my code works now, and I believe I finally understand why.
Thanks again for all of your help! I hope to continue these courses more quickly during the holidays (although I don’t know because I also need to learn PyTorch and FasterAI stuff). Anyways, should I continue, this will likely not be the last time I need help, so once again thank you so much for guiding me through this debugging process!