ValueError: operands could not be broadcast together with shapes (2,7) (2,)

vijayneuralnet · January 21, 2024, 6:46pm

When creating a post, please add:
C3W1_Assignment
Last Checkpoint: 13/12/2023

#UNIT TESTS
w1_unittest.test_test_model(log_perplexity)

Description: Error in the test_case, one of the test case is injecting invalid test case value into the dimension into Preds sequence as a result target sequence is corrupted.

Example below:
Preds Shape: (2, 1, 7, 3) Targets shape:(2, 7) (2, 7, 3)
Log Shape: (2, 2, 7), non_pad Shape (2, 7)

Expected Press Sequence: (2,7,3),
Error preds Sequence; (2,1,7,3)

Preds Shape: (1, 5, 3) Targets shape:(1, 5) (1, 5, 3)
Log Shape: (1, 5), non_pad Shape (1, 5)

Preds Shape: (1, 8, 5) Targets shape:(1, 8) (1, 8, 5)
Log Shape: (1, 8), non_pad Shape (1, 8)

Preds Shape: (1, 7, 3) Targets shape:(1, 7) (1, 7, 3)
Log Shape: (1, 7), non_pad Shape (1, 7)

Preds Shape: (2, 1, 7, 3) Targets shape:(2, 7) (2, 7, 3)
Log Shape: (2, 2, 7), non_pad Shape (2, 7)

ValueError Traceback (most recent call last)
Cell In[144], line 2
1 #UNIT TESTS

(1, 7, 3) 3
Preds Shape: (1, 7, 3) Targets shape:(1, 7) (1, 7, 3)
Log Shape: (1, 7), non_pad Shape (1, 7)

(2, 1, 7, 3) 3
Preds Shape: (2, 1, 7, 3) Targets shape:(2, 7) (2, 7, 3)
Log Shape: (2, 2, 7), non_pad Shape (2, 7)

arvyzukai · January 22, 2024, 7:23am

Hi @vijayneuralnet

Check this explanation to understand the calculations.

Having said that, the most common mistake is forgetting axis= -1 where appropriate (three instances of np.sum).

Cheers

vijayneuralnet · January 22, 2024, 9:31am

Hi @arvyzukai ,

I have added this piece of code for your reference…

This error is not emanating from here but the unit_tests_ testcase for Preds with dimensions (2,1,7,3). I have tf.unfreeze(dim=1) to bring to (2,7,3). The error still persists.

This is unrelated to np.sum(preds*target,axis=-1)

I have tried handling case to identify the preds dimension.

### START CODE HERE ###
    shp1,shp2 = preds.shape, target.shape
    mism = [i for i,w in enumerate(shp1.as_list()) if w not in shp2.as_list()]
    print(shp1,shp2,mism)
    if len(mism)==1:
        target_reshaped = tf.one_hot(target,preds.shape[mism[0]])
    else:
        preds = tf.squeeze(preds,axis=mism[0])
        target_reshaped = tf.one_hot(target,preds.shape[-1])

#     print(tf.reshape(target,shape=new_shape))
  # Calculate log probabilities for predictions using one-hot encoding
    print(f"Preds Shape: {preds.shape} Targets shape:{target.shape} {tf.one_hot(target,preds.shape[-1]).shape}")
    log_p = np.sum(preds * target_reshaped ,axis=-1) # HINT: tf.one_hot() should replace one of the Nones

arvyzukai · January 22, 2024, 10:16am

Hi @vijayneuralnet

Note the instructions for Exercise 5 - log_perplexity:

You can use tf.one_hot to transform the target into the same dimension. You then multiply them and sum them.

Your implementation is very messy (for target_reshaped) and hard to follow, the cleaner solution would look like:
tf.one_hot(?, ?.shape[-1])
This line of code is enough to get your target_reshaped.

Cheers

vijayneuralnet · January 23, 2024, 3:14pm

Issue is here:

This test case works:
Preds Shape: (1, 7, 3) Targets shape: (1, 7) Target_reshaped: (1, 7, 3)
Log Shape: (1, 7), non_pad Shape (1, 7)

Below Test case is invalid
Preds Shape: (2, 1, 7, 3) Target shape:(2, 7) Target_Reshaped: (2, 7, 3)

Note: preds.shape[-1] only modifies the last dimension. There are two dimensions in the invalid case.

Fails because of dim=1. For this to work, must suppress dim=1 in Preds

Since the Preds shape and Target_reshaped have different dimensions this implementation fails.

#  Reshaped the Targets
target_reshaped = tf.one_hot(target,preds.shape[-1])


print(f"Preds Shape: {preds.shape} Targets shape:{target.shape} Target_Reshaped: {tf.one_hot(target,preds.shape[-1]).shape}")

#  Calculate log probabilities for predictions using one-hot encoding
log_p = np.sum(preds * target_reshaped ,axis=-1

RyeToast · January 25, 2024, 7:40pm

Hi, @arvyzukai

I’m currently working on the section about calculating the log perplexity of a language model. Unfortunately, I’ve hit a roadblock due to a recurring InvalidArgumentError.

Error Context:
The error occurs in a cell where the split_input_target function is used. Here’s the problematic code snippet:

eval_text = "\n".join(eval_lines)
eval_ids = line_to_tensor([eval_text], vocab)
input_ids, target_ids = split_input_target(tf.squeeze(eval_ids, axis=0))

The error message indicates an issue with slicing a scalar input:

InvalidArgumentError: Attempting to slice scalar input. [Op:StridedSlice]

Troubleshooting Steps:
I’ve tried debugging by printing tensor shapes and examining variable values, but I’m constrained by the course guidelines, which restrict modifications only to certain sections of the notebook.

Reproducibility:
For your convenience, I’ve isolated the error in the attached minimal code snippet, which can be run independently to reproduce the issue.

Request for Guidance:
Considering the constraints on code modification, I would greatly appreciate your insights on:

Specific aspects of the data processing pipeline that might need revision.
Any potential issues with the input text formatting.
Suggestions for alternative debugging strategies within the allowed code sections.

Your expertise and guidance would be invaluable in helping me understand and resolve this issue.

arvyzukai · January 26, 2024, 6:09am

Hi @RyeToast

The problem most probably lies in your Exercise 01 -line_to_tensor(). This error says that you tried to slice a scalar (for example, 4[:3], which is not possible).

Exercise 01 is a very simple exercise which you should’ve implemented using two lines of code (that are identical to the code couple of cells above). This function should’ve returned ids variable which you could slice (for example, ids[:3], should be possible) because ids should’ve been a tensor.

Cheers

Cawnpore_Charlie · January 31, 2024, 5:05am

I am running into the exact same problem which appears to be with a test case which has:
a. preds dimension (2, 1, 7, 3) i.e., FOUR dimensions while
b. targets dimension is only 2 dimensions (2, 7)
The case where the preds dimension differs from the targets dimensions by 2 was not covered in the lecture or the lab. np.squeeze(preds) to get to dimension of (2, 7, 3) still leads to an error.
Please HELP. I would really appreciate a clarification on this point. Thank you.

arvyzukai · January 31, 2024, 7:04am

Hi @Cawnpore_Charlie

This unit test is a bit contrived (most probably accidentally, but also could be intentionally).

Let me explain the last test case (2,1,7,3) that you’re having problems with. (Usually, this is not a normal input shape in NLP but it is used in the test case).

So, in the unit test case the preds are:

<tf.Tensor: shape=(2, 1, 7, 3), dtype=float32, numpy=
array([[[[0.1 , 0.5 , 0.4 ],
         [0.05, 0.9 , 0.05],
         [0.2 , 0.3 , 0.5 ],
         [0.1 , 0.2 , 0.7 ],
         [0.2 , 0.8 , 0.1 ],
         [0.4 , 0.4 , 0.2 ],
         [0.5 , 0.  , 0.5 ]]],


       [[[0.1 , 0.5 , 0.4 ],
         [0.2 , 0.8 , 0.1 ],
         [0.4 , 0.4 , 0.2 ],
         [0.5 , 0.  , 0.5 ],
         [0.05, 0.9 , 0.05],
         [0.2 , 0.3 , 0.5 ],
         [0.1 , 0.2 , 0.7 ]]]], dtype=float32)>

the targets are:

<tf.Tensor: shape=(2, 7), dtype=int32, numpy=
array([[1, 2, 0, 2, 0, 2, 0],
       [2, 1, 1, 2, 2, 0, 0]], dtype=int32)>

when you one_hot encode the targets, you get:

<tf.Tensor: shape=(2, 7, 3), dtype=float32, numpy=
array([[[0., 1., 0.],
        [0., 0., 1.],
        [1., 0., 0.],
        [0., 0., 1.],
        [1., 0., 0.],
        [0., 0., 1.],
        [1., 0., 0.]],

       [[0., 0., 1.],
        [0., 1., 0.],
        [0., 1., 0.],
        [0., 0., 1.],
        [0., 0., 1.],
        [1., 0., 0.],
        [1., 0., 0.]]], dtype=float32)>

when you multiply that with preds, the broadcasting comes into play (so, you get (2,2,7,3):

<tf.Tensor: shape=(2, 2, 7, 3), dtype=float32, numpy=
array([[[[0.  , 0.5 , 0.  ],
         [0.  , 0.  , 0.05],
         [0.2 , 0.  , 0.  ],
         [0.  , 0.  , 0.7 ],
         [0.2 , 0.  , 0.  ],
         [0.  , 0.  , 0.2 ],
         [0.5 , 0.  , 0.  ]],

        [[0.  , 0.  , 0.4 ],
         [0.  , 0.9 , 0.  ],
         [0.  , 0.3 , 0.  ],
         [0.  , 0.  , 0.7 ],
         [0.  , 0.  , 0.1 ],
         [0.4 , 0.  , 0.  ],
         [0.5 , 0.  , 0.  ]]],


       [[[0.  , 0.5 , 0.  ],
         [0.  , 0.  , 0.1 ],
         [0.4 , 0.  , 0.  ],
         [0.  , 0.  , 0.5 ],
         [0.05, 0.  , 0.  ],
         [0.  , 0.  , 0.5 ],
         [0.1 , 0.  , 0.  ]],

        [[0.  , 0.  , 0.4 ],
         [0.  , 0.8 , 0.  ],
         [0.  , 0.4 , 0.  ],
         [0.  , 0.  , 0.5 ],
         [0.  , 0.  , 0.05],
         [0.2 , 0.  , 0.  ],
         [0.1 , 0.  , 0.  ]]]], dtype=float32)>

and when you sum over the last axis, you get:

array([[[0.5 , 0.05, 0.2 , 0.7 , 0.2 , 0.2 , 0.5 ],
        [0.4 , 0.9 , 0.3 , 0.7 , 0.1 , 0.4 , 0.5 ]],

       [[0.5 , 0.1 , 0.4 , 0.5 , 0.05, 0.5 , 0.1 ],
        [0.4 , 0.8 , 0.4 , 0.5 , 0.05, 0.2 , 0.1 ]]], dtype=float32)

# log_p.shape
# (2, 2, 7)

Just to continue the exercise further (it might help you or others understand what is asked from you):

# Identify non-padding elements in the target
array([[0., 1., 1., 1., 1., 1., 1.],
       [1., 0., 0., 1., 1., 1., 1.]])

# non_pad.shape
# (2, 7)

then (in this test case, the broadcasting again comes into play):

# Apply non-padding mask to log probabilities to exclude padding
array([[[0.        , 0.05      , 0.2       , 0.69999999, 0.2       ,
         0.2       , 0.5       ],
        [0.40000001, 0.        , 0.        , 0.69999999, 0.1       ,
         0.40000001, 0.5       ]],

       [[0.        , 0.1       , 0.40000001, 0.5       , 0.05      ,
         0.5       , 0.1       ],
        [0.40000001, 0.        , 0.        , 0.5       , 0.05      ,
         0.2       , 0.1       ]]])

# log_p.shape
# (2, 2, 7)

then:

# Calculate the log perplexity by taking the sum of log probabilities and dividing by the sum of non-padding elements

# numerator:
array([[1.85      , 2.1       ],
       [1.65000001, 1.25000001]])
# .shape
# (2, 2)

# denominator:
array([6., 5.])
# .shape
# (2,)

# log_ppx
array([[0.30833333, 0.42      ],
       [0.275     , 0.25      ]])
# .shape
# (2, 2)

lastly:

# Compute the mean of log perplexity
# log_ppx
0.31333333427707355

Again, I reiterate that his unit test is contrived and the number of dimensions should have been 3 (batch being the first) but this might help you get the idea.

Cheers

arvyzukai · January 31, 2024, 7:26am

By the way, the unit test should have looked like:

# preds
<tf.Tensor: shape=(2, 7, 3), dtype=float32, numpy=
array([[[0.1 , 0.5 , 0.4 ],
        [0.05, 0.9 , 0.05],
        [0.2 , 0.3 , 0.5 ],
        [0.1 , 0.2 , 0.7 ],
        [0.2 , 0.8 , 0.1 ],
        [0.4 , 0.4 , 0.2 ],
        [0.5 , 0.  , 0.5 ]],

       [[0.1 , 0.5 , 0.4 ],
        [0.2 , 0.8 , 0.1 ],
        [0.4 , 0.4 , 0.2 ],
        [0.5 , 0.  , 0.5 ],
        [0.05, 0.9 , 0.05],
        [0.2 , 0.3 , 0.5 ],
        [0.1 , 0.2 , 0.7 ]]], dtype=float32)>

# target
<tf.Tensor: shape=(2, 7), dtype=int32, numpy=
array([[1, 2, 0, 2, 0, 2, 0],
       [2, 1, 1, 2, 2, 0, 0]], dtype=int32)>

########################################################
# Calculate log probabilities for predictions using one-hot encoding
# target one_hot
<tf.Tensor: shape=(2, 7, 3), dtype=float32, numpy=
array([[[0., 1., 0.],
        [0., 0., 1.],
        [1., 0., 0.],
        [0., 0., 1.],
        [1., 0., 0.],
        [0., 0., 1.],
        [1., 0., 0.]],

       [[0., 0., 1.],
        [0., 1., 0.],
        [0., 1., 0.],
        [0., 0., 1.],
        [0., 0., 1.],
        [1., 0., 0.],
        [1., 0., 0.]]], dtype=float32)>

# preds * target one_hot
<tf.Tensor: shape=(2, 7, 3), dtype=float32, numpy=
array([[[0.  , 0.5 , 0.  ],
        [0.  , 0.  , 0.05],
        [0.2 , 0.  , 0.  ],
        [0.  , 0.  , 0.7 ],
        [0.2 , 0.  , 0.  ],
        [0.  , 0.  , 0.2 ],
        [0.5 , 0.  , 0.  ]],

       [[0.  , 0.  , 0.4 ],
        [0.  , 0.8 , 0.  ],
        [0.  , 0.4 , 0.  ],
        [0.  , 0.  , 0.5 ],
        [0.  , 0.  , 0.05],
        [0.2 , 0.  , 0.  ],
        [0.1 , 0.  , 0.  ]]], dtype=float32)>

# log_p is sum over an axis
# log_p
array([[0.5 , 0.05, 0.2 , 0.7 , 0.2 , 0.2 , 0.5 ],
       [0.4 , 0.8 , 0.4 , 0.5 , 0.05, 0.2 , 0.1 ]], dtype=float32)
# log_p.shape
# (2, 7)

########################################################
# Identify non-padding elements in the target
# non_pad
array([[0., 1., 1., 1., 1., 1., 1.],
       [1., 0., 0., 1., 1., 1., 1.]])
# non_pad.shape
# (2, 7)

########################################################
# Apply non-padding mask to log probabilities to exclude padding
# NOTE that log_p and non_pad shapes now match!
# log_p
array([[0.        , 0.05      , 0.2       , 0.69999999, 0.2       ,
        0.2       , 0.5       ],
       [0.40000001, 0.        , 0.        , 0.5       , 0.05      ,
        0.2       , 0.1       ]])
# log_p.shape
# (2, 7)

########################################################
# Calculate the log perplexity by taking the sum of log probabilities and dividing by the sum of non-padding elements
# numerator
array([1.85      , 1.25000001])

# denominator
array([6., 5.])

# log_ppx results in:
array([0.30833333, 0.25      ])
# log_ppx.shape
# (2,) # note only the batch dimension left

########################################################
# Compute the mean of log perplexity
# log_ppx (mean perplexity of the batch)
0.27916666759798925

Cheers

Cawnpore_Charlie · January 31, 2024, 8:17am

Thank you so much for your comprehensive and very prompt reply - greatly greatly appreciated!

One question, if I may: why does the approach of squeezing out that extra dimension fail?

Thanks, again.

arvyzukai · January 31, 2024, 8:32am

@Cawnpore_Charlie You’re welcome! I like good questions and yours was good.

It depends on your implementation - the subsequent computations that you do.

For example, in the last unit case np.squeeze(preds, axis=1) would result in (2, 7, 3) shape (but the same code would result in error for other test cases (when there’s no axes of length one to squeeze). Also, if you squeezed the tensor to (2, 7, 3), depending on your implementation, you would not arrive to the expected result for that unit test (see the calculations above ( 0.279 != 0.313)).

Cheers

Cawnpore_Charlie · January 31, 2024, 8:47am

Thanks for your prompt reply.

I tried np.squeeze only for cases where the difference in dimensions between preds and target was more than 1 (i.e., len(preds.shape) - len(target.shape) > 1 so that the computation for the other test-cases would remain unaffected but it still failed the last case with 0.279 != 0.313 as you point out.

I don’t quite get the educational value of this test case at all (other than teaching about broadcasting in high dimensions etc). The intuition of a 4-D preds is completely non-obvious to me - I would submit that this test case be either removed or modified to better conform to the lecture material as it results in quite a lot of churn for no obvious educational value.

I greatly appreciate your meticulously detailed and prompt responses. It is extremely reassuring to know that someone of your caliber is closely monitoring these boards. Thanks, again.

arvyzukai · January 31, 2024, 8:52am

You’re right and I submitted for correcting the unit test case right after answering your first question. Hopefully, it will be more adequate soon, thanks to you (and Vijay, whose post I missed previously)

Miguel_Alonso · June 27, 2024, 2:18pm

Any updates on changing the test case?
I’m having the same issue and before I go down trying to solve it, I thought I’d ask if there are plans to change it.

lucas.coutinho · June 28, 2024, 4:08pm

Hi @Miguel_Alonso,

I have fixed this issue this week. The new test case is now of shape (2,7,3), can you confirm that this is the case? You may open the w1_unittest.py and go to line containing the sentence Example 9. Batch of 2, the tensor there should have shape (2,7,3). If not, then please delete w1_unitttest.py and refresh your workspace. If this still doesn’t work, let me know and we investigate further.

Cheers,
Lucas

Miguel_Alonso · June 28, 2024, 4:53pm

Hi @lucas.coutinho

Thank you for checking this, however I think that the issue is still there…

I’ve refreshed my workspace and checked the unit tests, but the autograder is still failing with the same error message (“operands could not be broadcast…”).

Just to be clear: This is happening in the autograder, once you submit the assignment.

Deepti_Prasad · June 28, 2024, 5:20pm

you might have to delete or rename the older assignment copy. Delete the w1.unittest py file by clicking on file==open section. Then update the lab, you will get a fresh update copy of assignment, then re-do the assignment by referring to your saved copy, then re-submit.

regards
DP

Topic		Replies	Views
C3W1_Assignment 5.exercise I need help NLP with Sequence Models week-1	1	298	March 28, 2024
Assigment 1: log_perplexity: There was an error grading your submission. Details: operands could not be broadcast together with shapes (2,7) (2,) NLP with Sequence Models week-1	17	115	July 18, 2024
ValueError: operands could not be broadcast together with shapes (1,8) (1,7) NLP with Attention Models week-1	13	235	June 21, 2024
Week 2Assignment: Emojify Sequence Models coursera-platform	2	523	October 3, 2021
Model_test(model) error in propargate Neural Networks and Deep Learning coursera-platform	1	552	October 13, 2021

ValueError: operands could not be broadcast together with shapes (2,7) (2,)

Related topics