C4_W2_Assignment UNQ_C4 unit test has a bug: the testing data is wrong

larryleguo · September 10, 2022, 3:58am

According to the doc string, the input x shape is (n_batch X n_heads, seqlen, d_head)

So when reshaping x shape (n_batch, n_heads, seqlen, d_head), I use
n_batch = x.shape[0] // n_heads
x = jnp.reshape(x, (n_batch, n_heads, seqlen, d_head))
However, the unit test failed with a stack trace. If I use
x = jnp.reshape(x, (-1, n_heads, seqlen, d_head))
Then it passes.

Looking into the failed test case, I found:
x.shape = (6, 2, 3)
However, n_heads = 3, seqlen = 2, d_head = 2.
So n_batch should be 2.
Using x = jnp.reshape(x, (-1, n_heads, seqlen, d_head)), we have
x.shape = (3, 3, 2, 2) after reshaping. It passes the test, but that means n_batch is 3.

arvyzukai · September 12, 2022, 12:32pm

Hi @larryleguo

Could you be more specific about which Exercise are you talking about? Is it “UNQ_C4”?

larryleguo · September 12, 2022, 4:32pm

Yes, it is:

# UNQ_C4
# GRADED FUNCTION: compute_attention_output_closure
def compute_attention_output_closure(n_heads, d_head):
    ...

    def compute_attention_output(x):
        ...

arvyzukai · September 12, 2022, 4:46pm

If you have a tensor (x) of shape (6, 2, 3) and you reshape it with:
jnp.reshape(x, (-1, 3, 2, 3)) you would get the output of the same shape (2, 3, 2, 3) as in
jnp.reshape(x, (2, 3, 2, 3))

These are equivalent.

larryleguo · September 12, 2022, 6:11pm

Yes, but the issue is: the result shape must be (3, 3, 2, 2) to pass the test. (2, 3, 2, 3) won’t pass. The reason is that the “x” in the test case is invalid: it does not follow the (n_batch X n_heads, seqlen, d_head) shape as in the doc string.

David_Simmonds · September 12, 2022, 11:26pm

Hi,

I am having the same issue:

w2_tests.test_compute_attention_output_closure(compute_attention_output_closure)
input shape is : (6, 2, 3)
Reshape dimensions are: 3 2 2 3
Shape after reshape: (3, 2, 2, 3)
Transpose dimensions are: 3 2 2 3
# The middle 2s get swapped by swapping dimensions (0,1,2,3) to (0,2,1,3)
Shape after transpose: (3, 2, 2, 3)
input shape is : (6, 2, 3)
Reshape dimensions are: 2 3 2 2

InconclusiveDimensionOperation: Cannot divide evenly the sizes of shapes (6, 2, 3) and (2, 3, 2, 2)

arvyzukai · September 13, 2022, 10:27am

Nice catch! You are absolutely correct. The unit test functiontest_compute_attention_output_closure second part “test dummy tensors 2” is implement wrong. I will submit the issue for fixing.

Cheers

Davit_Khachatryan · November 8, 2022, 6:50pm

Is there any update on this? I am having the same issue (still getting this error). Please let us know.

arvyzukai · November 9, 2022, 6:17am

Hi @Davit_Khachatryan

The test is being fixed but is not fixed yet. For now, just use (-1):

x = jnp.reshape(x, (-1, n_heads, seqlen, d_head))

It will be fixed soon.

Cheers

Davit_Khachatryan · November 9, 2022, 3:50pm

@arvyzukai ,

Thank you!

Steven1 · December 6, 2022, 7:56pm

@arvyzukai - This may be only a semi-relevant question, but is there an inconsistency between the values for d_head and the dimensions of x?

x has (or should have) shape (n_batch X n_heads, seqlen, d_head).

So, both
x = jnp.reshape(x,(-1, n_heads, seqlen, d_head)), and
x = jnp.reshape(x,(-1, n_heads, seqlen, x.shape[-1]))

should be equally valid. However, the first option passes all 4 tests, and the other passes only 3 of 4 tests. Is this inconsisteny the issue, or have I misunderstood something?

arvyzukai · December 9, 2022, 9:00am

Hi @Steven1

The test case has one buggy array (issue with the shape) and that is the reason why some valid solutions do not pass.

As far as I understand it is not easy to fix, so the fixing of it is put off to some future date.

Cheers

Steven1 · December 9, 2022, 1:33pm

Hi @aryzukai,

I know and understand this has been an ongoing and difficult issue to solve. I was just thinking that since d_head and x.shape[-1] should always be the same, their inconsistency might point towards the problem … or, not. You guys know this code and topic far better than I. Anyway, thx for responding.

Ravi_Shankar1 · December 31, 2022, 2:28am

Using x.shape[-1] instead of d_head doesn’t cause exception, but still results in failure. 3 success, 1 fail.

lbaiao · March 6, 2023, 3:38pm

I am having the same problem. Has the unit test been fixed? I took a look at the unit test and it seems it hasn’t been fixed.

TDDFT · April 9, 2023, 5:00pm

Same error. It looks like it’s not fixed yet.

kwant · May 9, 2023, 6:55pm

as 09.05.2023 still not fixed :-/ I’ve wasted half hour for debugging my code but issue is elsewhere

arvyzukai · May 10, 2023, 7:22am

Thank you, for your note

The new instruction have been added for future learners in order not to get into this trouble until the test case will be fixed.

Topic		Replies	Views
Question on reshape NLP with Attention Models week-module-2	1	528	January 12, 2023
Issue with UNQ_C4 NLP with Attention Models week-module-2	11	642	January 31, 2022
C4 compute_attention_heads_closure misbehavior NLP with Attention Models week-module-2	5	657	September 12, 2022
C4_W2 compute_attention_heads transpose NLP with Attention Models week-module-2	1	480	May 19, 2023
C4_W1: Problem with UNQ_C7 NLP with Attention Models week-module-1	3	417	July 27, 2023

C4_W2_Assignment UNQ_C4 unit test has a bug: the testing data is wrong

Related topics