C5 W1 Assignment 2 "wrong values"

Hello there!

All was going well in the Dinosauraus sequence model assignment.
But in Exercise 2 I am seeing the error:

Sampling:
list of sampled indices:
 [23, 7, 15, 26, 25, 23, 21, 14, 23, 23, 7, 16, 26, 24, 18, 14, 10, 0]
list of sampled characters:
 ['w', 'g', 'o', 'z', 'y', 'w', 'u', 'n', 'w', 'w', 'g', 'p', 'z', 'x', 'r', 'n', 'j', '\n']
---------------------------------------------------------------------------

..
...

AssertionError: Wrong values

I can see that the values are nonsense. I’ve checked and re-checked for bugs, and I can’t find any. I wonder whether there is something quite fundamental in the exercise that I’m not understanding.
The assignment gives hints on how to use ravel but I don’t see any need to. I wonder whether this is a clue to where I’m going wrong?

This is my step 3, which I think might be where the problem lies?

# Step 3: Sample the index of a character within the vocabulary from the probability distribution y
# (see additional hints above)
idx = np.random.choice(range(vocab_size), p = y[:,counter])

# Append the index to "indices"
indices.append(idx)

Any pointers would be appreciated!

1 Like

It appears that you ignored the “additional hint” about using ravel().

p=y.ravel()

Hi @TMosh thanks for the prompt feedback. I figured it had something to do with the hint but I’m now more confused. Blindly using y.ravel() as the list of probabilities doesn’t work in my code and I wouldn’t expect it to (so I must be doing something wrong somewhere else…

Here’s what I thought we were doing:

  • In step two, generating “y”, a 2d matrix of predictions for y.
  • Then in step 3, taking the 1d vector from y which represents the step (counter) that we’re interested in, y(t+1), whose probabilities sum to one (they do) and then using that in the np.random.choice to choose a likely letter from the dictionary.

Using y.unravel surely would unravel the (27,100) into a 1d vector of length 2700 which is no use for the choice function as probabilities add up to 100.

As I say, I’m pretty sure that I’m misunderstanding what we’re trying to do. Could you help my thinking?

Here’s the remainder of the code for context:

# Step 2: Forward propagate x using the equations (1), (2) and (3)
        a = np.tanh( np.dot(Wax, x) + np.dot(Waa, a_prev) + b)
        z = np.dot(Wya, a) + by
        y = softmax(z)
        
        print(np.sum(y[:,counter])) ## returns ~1
        
        # For grading purposes
        np.random.seed(counter + seed) 
        
        # Step 3: Sample the index of a character within the vocabulary from the probability distribution y
        # (see additional hints above)
        idx = np.random.choice(len(y.ravel()), p = y.ravel()) ## error
        print(idx)
        
        # Append the index to "indices"
        indices.append(idx)
        
        # Step 4: Overwrite the input x with one that corresponds to the sampled index `idx`.
        # (see additional hints above)
        x = np.zeros(vocab_size)
        x[idx] = 1
        
        # Update "a_prev" to be "a"
        a_prev = a
1 Like

My comment about using “p=y.ravel()” only referred to replacing that part of your code. You still need range(vocab_size) as the first parameter.

Also, anywhere you’re using np.zeros(vocab_size), you probably need to use
np.zeros((vocab_size, 1))
…because np_zeros needs to be passed a tuple.

1 Like