Emoji_v3a model() must give perfect accuracy

I can’t find the bug in

# Optimization loop
    for t in range(num_iterations): # Loop over the number of iterations
        for i in range(m):          # Loop over the training examples
            
            ### START CODE HERE ### (≈ 4 lines of code)
            {mentor edit: code removed}

Eventhough the assertion is ‘‘model() must give perfect accuracy’’ and pred = [0. 0. 0. 0. 0. 0. 0. 1. 1. 0. 0. 1.] instead of Y = [0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1].
In the training cell below it seems like there’s a missmatch between the dimensions of avg and b (eventhough sentence_to_avg is initialized correctly with np.zeros(word_to_vec_map[any_word].shape)

Help, I’m staring at the 4 lines code for an hour.

Best,
Hannes

“forward propagate the avg” means you multiply by W and add b.
The instructions for Exercise 2 give you the equation.

Also, you should check your code for computing the cost.

Oh dear, I didn’t see the “.” in the formula. But the cost should be fine, it’s the vectorized version for the sum ;).

Does your code work correctly now?

Thanks, yes! After adding the dotproduct everything worked out!!

A note about using np.average would be useful in this problem as well.

I’m completely stuck on this section. Unlike the OP the shapes of avg, W, a, and z all seem to be correct.

The problem seems to be the calculation of the cost function, but I have tried different variations, using np.dot and *, but the result is always the same: “AssertionError: Model must give a perfect accuracy”

The accuracy of the model decreases as well:
Epoch: 0 — cost = 2.664198098365268
Accuracy: 0.9166666666666666
Epoch: 100 — cost = 96.19998254000362
Accuracy: 0.5

Any suggestions would be appreciated.
Matt

Well, notice that your cost is going up rather than down with more iterations. Maybe the problem is not how you compute the cost, but your gradients and how you are applying them. E.g. are you subtracting the gradient terms (times learning rate of course) or maybe adding them? :scream_cat:

Also note that this thread is more than a year old, so there is no guarantee that any of the participants are still listening. I just happened to notice because I had set “Watching” on this thread back when it first happened.

Hi Paul, thanks for answering.

So the gradients are computed for us, they are outside of the area we are supposed to code. I did look at them, but I don’t see any glaring issues, and I assume someone else would have asked the question by now if that was the problem.

Here are the gradient computation and updating sections we are given:

Compute gradients

        dz = a - Y_oh[i]
        dW += -np.dot(dz.reshape(n_y,1), avg.reshape(1, n_h))
        db += dz

        # Update parameters with Stochastic Gradient Descent
        W = W - learning_rate * dW
        b = b - learning_rate * db

Good point. Sorry, I forgot to notice which parts were the template. So it must just be your cost code itself.

Ok, looking again more closely at the code and thinking \epsilon harder, notice that both the cost and the gradients depend on the value of a, which is computed by your code. Nothing depends on the cost. Maybe your logic for computing a is incorrect, but it should be pretty simple. The math formula is:

a = softmax(W \cdot avg + b)

Not much that could go off the rails there. If that suggestion doesn’t pan out, it’s time to just look at your code. We aren’t supposed to do that on a public thread, but I’ll send you a DM about how we can proceed with that.

Ok, to close the loop on the public thread, we had a followup conversation by DM and it turns out the problem is simple and you can see it in the fragment of the template code that Matt shows above. Notice that an errant minus sign got added to the code that computes dW. Sorry that I didn’t spot that earlier, but it took doing the direct compare with my code on a line by line basis to finally spot it.