ValueError: operands could not be broadcast together with shapes (2,7) (2,)

When creating a post, please add:
C3W1_Assignment
Last Checkpoint: 13/12/2023

#UNIT TESTS
w1_unittest.test_test_model(log_perplexity)

Description: Error in the test_case, one of the test case is injecting invalid test case value into the dimension into Preds sequence as a result target sequence is corrupted.

Example below:
Preds Shape: (2, 1, 7, 3) Targets shape:(2, 7) (2, 7, 3)
Log Shape: (2, 2, 7), non_pad Shape (2, 7)

Expected Press Sequence: (2,7,3),
Error preds Sequence; (2,1,7,3)

Preds Shape: (1, 5, 3) Targets shape:(1, 5) (1, 5, 3)
Log Shape: (1, 5), non_pad Shape (1, 5)

Preds Shape: (1, 5, 3) Targets shape:(1, 5) (1, 5, 3)
Log Shape: (1, 5), non_pad Shape (1, 5)

Preds Shape: (1, 5, 3) Targets shape:(1, 5) (1, 5, 3)
Log Shape: (1, 5), non_pad Shape (1, 5)

Preds Shape: (1, 5, 3) Targets shape:(1, 5) (1, 5, 3)
Log Shape: (1, 5), non_pad Shape (1, 5)

Preds Shape: (1, 8, 5) Targets shape:(1, 8) (1, 8, 5)
Log Shape: (1, 8), non_pad Shape (1, 8)

Preds Shape: (1, 8, 5) Targets shape:(1, 8) (1, 8, 5)
Log Shape: (1, 8), non_pad Shape (1, 8)

Preds Shape: (1, 8, 5) Targets shape:(1, 8) (1, 8, 5)
Log Shape: (1, 8), non_pad Shape (1, 8)

Preds Shape: (1, 7, 3) Targets shape:(1, 7) (1, 7, 3)
Log Shape: (1, 7), non_pad Shape (1, 7)

Preds Shape: (2, 1, 7, 3) Targets shape:(2, 7) (2, 7, 3)
Log Shape: (2, 2, 7), non_pad Shape (2, 7)


ValueError Traceback (most recent call last)
Cell In[144], line 2
1 #UNIT TESTS

(1, 7, 3) 3
Preds Shape: (1, 7, 3) Targets shape:(1, 7) (1, 7, 3)
Log Shape: (1, 7), non_pad Shape (1, 7)

(2, 1, 7, 3) 3
Preds Shape: (2, 1, 7, 3) Targets shape:(2, 7) (2, 7, 3)
Log Shape: (2, 2, 7), non_pad Shape (2, 7)

1 Like

Hi @vijayneuralnet

Check this explanation to understand the calculations.

Having said that, the most common mistake is forgetting axis= -1 where appropriate (three instances of np.sum).

Cheers

2 Likes

Hi @arvyzukai ,

I have added this piece of code for your reference…

This error is not emanating from here but the unit_tests_ testcase for Preds with dimensions (2,1,7,3). I have tf.unfreeze(dim=1) to bring to (2,7,3). The error still persists.

This is unrelated to np.sum(preds*target,axis=-1)

I have tried handling case to identify the preds dimension.

### START CODE HERE ###
    shp1,shp2 = preds.shape, target.shape
    mism = [i for i,w in enumerate(shp1.as_list()) if w not in shp2.as_list()]
    print(shp1,shp2,mism)
    if len(mism)==1:
        target_reshaped = tf.one_hot(target,preds.shape[mism[0]])
    else:
        preds = tf.squeeze(preds,axis=mism[0])
        target_reshaped = tf.one_hot(target,preds.shape[-1])

#     print(tf.reshape(target,shape=new_shape))
  # Calculate log probabilities for predictions using one-hot encoding
    print(f"Preds Shape: {preds.shape} Targets shape:{target.shape} {tf.one_hot(target,preds.shape[-1]).shape}")
    log_p = np.sum(preds * target_reshaped ,axis=-1) # HINT: tf.one_hot() should replace one of the Nones
    
1 Like

Hi @vijayneuralnet

Note the instructions for Exercise 5 - log_perplexity:

You can use tf.one_hot to transform the target into the same dimension. You then multiply them and sum them.

Your implementation is very messy (for target_reshaped) and hard to follow, the cleaner solution would look like:
tf.one_hot(?, ?.shape[-1])
This line of code is enough to get your target_reshaped.

Cheers

1 Like

Issue is here:

This test case works:
Preds Shape: (1, 7, 3) Targets shape: (1, 7) Target_reshaped: (1, 7, 3)
Log Shape: (1, 7), non_pad Shape (1, 7)

Below Test case is invalid
Preds Shape: (2, 1, 7, 3) Target shape:(2, 7) Target_Reshaped: (2, 7, 3)

Note: preds.shape[-1] only modifies the last dimension. There are two dimensions in the invalid case.

Fails because of dim=1. For this to work, must suppress dim=1 in Preds

Since the Preds shape and Target_reshaped have different dimensions this implementation fails.

#  Reshaped the Targets
target_reshaped = tf.one_hot(target,preds.shape[-1])


print(f"Preds Shape: {preds.shape} Targets shape:{target.shape} Target_Reshaped: {tf.one_hot(target,preds.shape[-1]).shape}")

#  Calculate log probabilities for predictions using one-hot encoding
log_p = np.sum(preds * target_reshaped ,axis=-1
1 Like

Hi, @arvyzukai

I’m currently working on the section about calculating the log perplexity of a language model. Unfortunately, I’ve hit a roadblock due to a recurring InvalidArgumentError.

Error Context:
The error occurs in a cell where the split_input_target function is used. Here’s the problematic code snippet:

eval_text = "\n".join(eval_lines)
eval_ids = line_to_tensor([eval_text], vocab)
input_ids, target_ids = split_input_target(tf.squeeze(eval_ids, axis=0))

The error message indicates an issue with slicing a scalar input:

InvalidArgumentError: Attempting to slice scalar input. [Op:StridedSlice]

Troubleshooting Steps:
I’ve tried debugging by printing tensor shapes and examining variable values, but I’m constrained by the course guidelines, which restrict modifications only to certain sections of the notebook.

Reproducibility:
For your convenience, I’ve isolated the error in the attached minimal code snippet, which can be run independently to reproduce the issue.

Request for Guidance:
Considering the constraints on code modification, I would greatly appreciate your insights on:

  • Specific aspects of the data processing pipeline that might need revision.
  • Any potential issues with the input text formatting.
  • Suggestions for alternative debugging strategies within the allowed code sections.

Your expertise and guidance would be invaluable in helping me understand and resolve this issue.

1 Like

Hi @RyeToast

The problem most probably lies in your Exercise 01 -line_to_tensor(). This error says that you tried to slice a scalar (for example, 4[:3], which is not possible).

Exercise 01 is a very simple exercise which you should’ve implemented using two lines of code (that are identical to the code couple of cells above). This function should’ve returned ids variable which you could slice (for example, ids[:3], should be possible) because ids should’ve been a tensor.

Cheers

1 Like

I am running into the exact same problem which appears to be with a test case which has:
a. preds dimension (2, 1, 7, 3) i.e., FOUR dimensions while
b. targets dimension is only 2 dimensions (2, 7)
The case where the preds dimension differs from the targets dimensions by 2 was not covered in the lecture or the lab. np.squeeze(preds) to get to dimension of (2, 7, 3) still leads to an error.
Please HELP. I would really appreciate a clarification on this point. Thank you.

Hi @Cawnpore_Charlie

This unit test is a bit contrived (most probably accidentally, but also could be intentionally).

Let me explain the last test case (2,1,7,3) that you’re having problems with. (Usually, this is not a normal input shape in NLP but it is used in the test case).

So, in the unit test case the preds are:

<tf.Tensor: shape=(2, 1, 7, 3), dtype=float32, numpy=
array([[[[0.1 , 0.5 , 0.4 ],
         [0.05, 0.9 , 0.05],
         [0.2 , 0.3 , 0.5 ],
         [0.1 , 0.2 , 0.7 ],
         [0.2 , 0.8 , 0.1 ],
         [0.4 , 0.4 , 0.2 ],
         [0.5 , 0.  , 0.5 ]]],


       [[[0.1 , 0.5 , 0.4 ],
         [0.2 , 0.8 , 0.1 ],
         [0.4 , 0.4 , 0.2 ],
         [0.5 , 0.  , 0.5 ],
         [0.05, 0.9 , 0.05],
         [0.2 , 0.3 , 0.5 ],
         [0.1 , 0.2 , 0.7 ]]]], dtype=float32)>

the targets are:

<tf.Tensor: shape=(2, 7), dtype=int32, numpy=
array([[1, 2, 0, 2, 0, 2, 0],
       [2, 1, 1, 2, 2, 0, 0]], dtype=int32)>

when you one_hot encode the targets, you get:

<tf.Tensor: shape=(2, 7, 3), dtype=float32, numpy=
array([[[0., 1., 0.],
        [0., 0., 1.],
        [1., 0., 0.],
        [0., 0., 1.],
        [1., 0., 0.],
        [0., 0., 1.],
        [1., 0., 0.]],

       [[0., 0., 1.],
        [0., 1., 0.],
        [0., 1., 0.],
        [0., 0., 1.],
        [0., 0., 1.],
        [1., 0., 0.],
        [1., 0., 0.]]], dtype=float32)>

when you multiply that with preds, the broadcasting comes into play (so, you get (2,2,7,3):

<tf.Tensor: shape=(2, 2, 7, 3), dtype=float32, numpy=
array([[[[0.  , 0.5 , 0.  ],
         [0.  , 0.  , 0.05],
         [0.2 , 0.  , 0.  ],
         [0.  , 0.  , 0.7 ],
         [0.2 , 0.  , 0.  ],
         [0.  , 0.  , 0.2 ],
         [0.5 , 0.  , 0.  ]],

        [[0.  , 0.  , 0.4 ],
         [0.  , 0.9 , 0.  ],
         [0.  , 0.3 , 0.  ],
         [0.  , 0.  , 0.7 ],
         [0.  , 0.  , 0.1 ],
         [0.4 , 0.  , 0.  ],
         [0.5 , 0.  , 0.  ]]],


       [[[0.  , 0.5 , 0.  ],
         [0.  , 0.  , 0.1 ],
         [0.4 , 0.  , 0.  ],
         [0.  , 0.  , 0.5 ],
         [0.05, 0.  , 0.  ],
         [0.  , 0.  , 0.5 ],
         [0.1 , 0.  , 0.  ]],

        [[0.  , 0.  , 0.4 ],
         [0.  , 0.8 , 0.  ],
         [0.  , 0.4 , 0.  ],
         [0.  , 0.  , 0.5 ],
         [0.  , 0.  , 0.05],
         [0.2 , 0.  , 0.  ],
         [0.1 , 0.  , 0.  ]]]], dtype=float32)>

and when you sum over the last axis, you get:

array([[[0.5 , 0.05, 0.2 , 0.7 , 0.2 , 0.2 , 0.5 ],
        [0.4 , 0.9 , 0.3 , 0.7 , 0.1 , 0.4 , 0.5 ]],

       [[0.5 , 0.1 , 0.4 , 0.5 , 0.05, 0.5 , 0.1 ],
        [0.4 , 0.8 , 0.4 , 0.5 , 0.05, 0.2 , 0.1 ]]], dtype=float32)

# log_p.shape
# (2, 2, 7)

Just to continue the exercise further (it might help you or others understand what is asked from you):
# Identify non-padding elements in the target
array([[0., 1., 1., 1., 1., 1., 1.],
       [1., 0., 0., 1., 1., 1., 1.]])

# non_pad.shape
# (2, 7)

then (in this test case, the broadcasting again comes into play):

# Apply non-padding mask to log probabilities to exclude padding
array([[[0.        , 0.05      , 0.2       , 0.69999999, 0.2       ,
         0.2       , 0.5       ],
        [0.40000001, 0.        , 0.        , 0.69999999, 0.1       ,
         0.40000001, 0.5       ]],

       [[0.        , 0.1       , 0.40000001, 0.5       , 0.05      ,
         0.5       , 0.1       ],
        [0.40000001, 0.        , 0.        , 0.5       , 0.05      ,
         0.2       , 0.1       ]]])

# log_p.shape
# (2, 2, 7)

then:

# Calculate the log perplexity by taking the sum of log probabilities and dividing by the sum of non-padding elements

# numerator:
array([[1.85      , 2.1       ],
       [1.65000001, 1.25000001]])
# .shape
# (2, 2)

# denominator:
array([6., 5.])
# .shape
# (2,)

# log_ppx
array([[0.30833333, 0.42      ],
       [0.275     , 0.25      ]])
# .shape
# (2, 2)

lastly:

# Compute the mean of log perplexity
# log_ppx
0.31333333427707355

Again, I reiterate that his unit test is contrived and the number of dimensions should have been 3 (batch being the first) but this might help you get the idea.

Cheers

2 Likes

By the way, the unit test should have looked like:

# preds
<tf.Tensor: shape=(2, 7, 3), dtype=float32, numpy=
array([[[0.1 , 0.5 , 0.4 ],
        [0.05, 0.9 , 0.05],
        [0.2 , 0.3 , 0.5 ],
        [0.1 , 0.2 , 0.7 ],
        [0.2 , 0.8 , 0.1 ],
        [0.4 , 0.4 , 0.2 ],
        [0.5 , 0.  , 0.5 ]],

       [[0.1 , 0.5 , 0.4 ],
        [0.2 , 0.8 , 0.1 ],
        [0.4 , 0.4 , 0.2 ],
        [0.5 , 0.  , 0.5 ],
        [0.05, 0.9 , 0.05],
        [0.2 , 0.3 , 0.5 ],
        [0.1 , 0.2 , 0.7 ]]], dtype=float32)>

# target
<tf.Tensor: shape=(2, 7), dtype=int32, numpy=
array([[1, 2, 0, 2, 0, 2, 0],
       [2, 1, 1, 2, 2, 0, 0]], dtype=int32)>

########################################################
# Calculate log probabilities for predictions using one-hot encoding
# target one_hot
<tf.Tensor: shape=(2, 7, 3), dtype=float32, numpy=
array([[[0., 1., 0.],
        [0., 0., 1.],
        [1., 0., 0.],
        [0., 0., 1.],
        [1., 0., 0.],
        [0., 0., 1.],
        [1., 0., 0.]],

       [[0., 0., 1.],
        [0., 1., 0.],
        [0., 1., 0.],
        [0., 0., 1.],
        [0., 0., 1.],
        [1., 0., 0.],
        [1., 0., 0.]]], dtype=float32)>

# preds * target one_hot
<tf.Tensor: shape=(2, 7, 3), dtype=float32, numpy=
array([[[0.  , 0.5 , 0.  ],
        [0.  , 0.  , 0.05],
        [0.2 , 0.  , 0.  ],
        [0.  , 0.  , 0.7 ],
        [0.2 , 0.  , 0.  ],
        [0.  , 0.  , 0.2 ],
        [0.5 , 0.  , 0.  ]],

       [[0.  , 0.  , 0.4 ],
        [0.  , 0.8 , 0.  ],
        [0.  , 0.4 , 0.  ],
        [0.  , 0.  , 0.5 ],
        [0.  , 0.  , 0.05],
        [0.2 , 0.  , 0.  ],
        [0.1 , 0.  , 0.  ]]], dtype=float32)>

# log_p is sum over an axis
# log_p
array([[0.5 , 0.05, 0.2 , 0.7 , 0.2 , 0.2 , 0.5 ],
       [0.4 , 0.8 , 0.4 , 0.5 , 0.05, 0.2 , 0.1 ]], dtype=float32)
# log_p.shape
# (2, 7)

########################################################
# Identify non-padding elements in the target
# non_pad
array([[0., 1., 1., 1., 1., 1., 1.],
       [1., 0., 0., 1., 1., 1., 1.]])
# non_pad.shape
# (2, 7)

########################################################
# Apply non-padding mask to log probabilities to exclude padding
# NOTE that log_p and non_pad shapes now match!
# log_p
array([[0.        , 0.05      , 0.2       , 0.69999999, 0.2       ,
        0.2       , 0.5       ],
       [0.40000001, 0.        , 0.        , 0.5       , 0.05      ,
        0.2       , 0.1       ]])
# log_p.shape
# (2, 7)

########################################################
# Calculate the log perplexity by taking the sum of log probabilities and dividing by the sum of non-padding elements
# numerator
array([1.85      , 1.25000001])

# denominator
array([6., 5.])

# log_ppx results in:
array([0.30833333, 0.25      ])
# log_ppx.shape
# (2,) # note only the batch dimension left

########################################################
# Compute the mean of log perplexity
# log_ppx (mean perplexity of the batch)
0.27916666759798925

Cheers

1 Like

Thank you so much for your comprehensive and very prompt reply - greatly greatly appreciated!

One question, if I may: why does the approach of squeezing out that extra dimension fail?

Thanks, again.

@Cawnpore_Charlie You’re welcome! I like good questions and yours was good.

It depends on your implementation - the subsequent computations that you do.

For example, in the last unit case np.squeeze(preds, axis=1) would result in (2, 7, 3) shape (but the same code would result in error for other test cases (when there’s no axes of length one to squeeze). Also, if you squeezed the tensor to (2, 7, 3), depending on your implementation, you would not arrive to the expected result for that unit test (see the calculations above ( 0.279 != 0.313)).

Cheers

Thanks for your prompt reply.

I tried np.squeeze only for cases where the difference in dimensions between preds and target was more than 1 (i.e., len(preds.shape) - len(target.shape) > 1 so that the computation for the other test-cases would remain unaffected but it still failed the last case with 0.279 != 0.313 as you point out.

I don’t quite get the educational value of this test case at all (other than teaching about broadcasting in high dimensions etc). The intuition of a 4-D preds is completely non-obvious to me - I would submit that this test case be either removed or modified to better conform to the lecture material as it results in quite a lot of churn for no obvious educational value.

I greatly appreciate your meticulously detailed and prompt responses. It is extremely reassuring to know that someone of your caliber is closely monitoring these boards. Thanks, again.

You’re right and I submitted for correcting the unit test case right after answering your first question. Hopefully, it will be more adequate soon, thanks to you :slight_smile: (and Vijay, whose post I missed previously)

1 Like