C4W2_next_word

Hi,
I have and issue with the implementation of the the next_word function. Just for context, I’ve passed all the test before, so hopefully the issue is with my implementation itself. Anyway I’m getting the following error:
2024-08-30 21:27:44.787955: W tensorflow/core/framework/op_kernel.cc:1828] OP_REQUIRES failed at einsum_op_impl.h:506 : INVALID_ARGUMENT: Expected input 0 to have rank 4 but got: 3

InvalidArgumentError Traceback (most recent call last)
Cell In[41], line 10
7 output = tf.expand_dims([tokenizer.word_index[“[SOS]”]], 0)
9 # predict the next word with your function
—> 10 predicted_token = next_word(transformer, encoder_input, output)
11 print(f"Predicted token: {predicted_token}")
13 predicted_word = tokenizer.sequences_to_texts(predicted_token.numpy())[0]

Cell In[40], line 21, in next_word(model, encoder_input, output)
18 dec_padding_mask = create_padding_mask(output)
20 # Run the prediction of the next word with the transformer model
—> 21 predictions, attention_weights = model(
22 encoder_input,
23 output,
24 False,
25 enc_padding_mask,
26 look_ahead_mask,
27 dec_padding_mask
28 )
29 ### END CODE HERE ###
31 predictions = predictions[: ,-1:, :]

File /usr/local/lib/python3.8/dist-packages/keras/src/utils/traceback_utils.py:70, in filter_traceback..error_handler(*args, **kwargs)
67 filtered_tb = _process_traceback_frames(e.traceback)
68 # To get the full stack trace, call:
69 # tf.debugging.disable_traceback_filtering()
—> 70 raise e.with_traceback(filtered_tb) from None
71 finally:
72 del filtered_tb

Cell In[21], line 57, in Transformer.call(self, input_sentence, output_sentence, training, enc_padding_mask, look_ahead_mask, dec_padding_mask)
53 enc_output = self.encoder(input_sentence, training = training, mask = enc_padding_mask)
55 # call self.decoder with the appropriate arguments to get the decoder output
56 # dec_output.shape == (batch_size, tar_seq_len, fully_connected_dim)
—> 57 dec_output, attention_weights = self.decoder(
58 output_sentence,
59 enc_output = enc_output,
60 training = training,
61 look_ahead_mask = look_ahead_mask,
62 padding_mask = dec_padding_mask
63 )
65 # pass decoder output through a linear layer and softmax (~1 line)
66 final_output = self.final_layer(dec_output)

Cell In[18], line 66, in Decoder.call(self, x, enc_output, training, look_ahead_mask, padding_mask)
62 # use a for loop to pass x through a stack of decoder layers and update attention_weights (~4 lines total)
63 for i in range(self.num_layers):
64 # pass x and the encoder output through a stack of decoder layers and save the attention weights
65 # of block 1 and 2 (~1 line)
—> 66 x, block1, block2 = self.dec_layers[i](
67 x,
68 enc_output = enc_output,
69 training = training,
70 look_ahead_mask = look_ahead_mask,
71 padding_mask = padding_mask
72 )
74 #update attention_weights dictionary with the attention weights of block 1 and block 2
75 attention_weights[‘decoder_layer{}_block1_self_att’.format(i+1)] = block1

Cell In[15], line 58, in DecoderLayer.call(self, x, enc_output, training, look_ahead_mask, padding_mask)
36 “”"
37 Forward pass for the Decoder Layer
38
(…)
49 attn_weights_block2 (tf.Tensor): Tensor of shape (batch_size, num_heads, target_seq_len, input_seq_len)
50 “”"
52 ### START CODE HERE ###
53 # enc_output.shape == (batch_size, input_seq_len, fully_connected_dim)
54
55 # BLOCK 1
56 # calculate self-attention and return attention scores as attn_weights_block1.
57 # Dropout will be applied during training (~1 line).
—> 58 mult_attn_out1, attn_weights_block1 = self.mha1(
59 query = x,
60 value = x,
61 attention_mask = padding_mask,
62 return_attention_scores = True,
63 training = training,
64 use_causal_mask = True
65 )
67 # apply layer normalization (layernorm1) to the sum of the attention output and the input (~1 line)
68 Q1 = self.layernorm1(mult_attn_out1+x)

InvalidArgumentError: Exception encountered when calling layer ‘query’ (type EinsumDense).

{{function_node _wrapped__Einsum_N_2_device/job:localhost/replica:0/task:0/device:GPU:0}} Expected input 0 to have rank 4 but got: 3 [Op:Einsum] name:

Call arguments received by layer ‘query’ (type EinsumDense):
• inputs=tf.Tensor(shape=(1, 1, 128), dtype=float32)

your first error lies with Block 1 codes for self attention, there is a missing x. Also no training is suppose to be used. you also don’t require use casual mask=True for block 1 of block 2 codes.

Also please make sure that you are working on an updated copy of assignment.

regards
DP

Hi,
Thank you for your reply.
How do I know if I’m working with the updated copy of the assignment?

Also, I used what you told me to change my code. And know it works.
However, there were no issues with the 2 x, as I was just letting tf take the same tensor for the key and value arguments. The issue was that I was always using casual mask = True.

Thank you.

1 Like

yes you are right but I have got dm from you that you still failed on grade submission. I hope you have cleared?

Hi! Yes, I finally was able to clear it. The problem was on the ‘next_word’ function. I hadn’t realized I need it to add padding on the output. (silly mistake). However, training the model doesn’t work. There is a problem with the train step function, but this doesn’t affect the grading.

model doesn’t work? can you please share a screenshot about what this issue is

Sure!

Epoch 1, Batch 1/231

ValueError Traceback (most recent call last)
Cell In[50], line 18
16 for (batch, (inp, tar)) in enumerate(dataset):
17 print(f’Epoch {epoch+1}, Batch {batch+1}/{number_of_batches}‘, end=’\r’)
—> 18 train_step(transformer, inp, tar)
20 print (f’Epoch {epoch+1}, Loss {train_loss.result():.4f}')
21 losses.append(train_loss.result())

File /usr/local/lib/python3.8/dist-packages/tensorflow/python/util/traceback_utils.py:153, in filter_traceback..error_handler(*args, **kwargs)
151 except Exception as e:
152 filtered_tb = _process_traceback_frames(e.traceback)
→ 153 raise e.with_traceback(filtered_tb) from None
154 finally:
155 del filtered_tb

File /tmp/autograph_generated_fileuu2pj3j4.py:15, in outer_factory..inner_factory..tf__train_step(model, inp, tar)
13 dec_padding_mask = ag
.converted_call(ag__.ld(create_padding_mask), (ag__.ld(inp),), None, fscope)
14 with ag__.ld(tf).GradientTape() as tape:
—> 15 (predictions, ) = ag_.converted_call(ag__.ld(model), (ag__.ld(inp), ag__.ld(tar_inp), True, ag__.ld(enc_padding_mask), ag__.ld(look_ahead_mask), ag__.ld(dec_padding_mask)), None, fscope)
16 loss = ag__.converted_call(ag__.ld(masked_loss), (ag__.ld(tar_real), ag__.ld(predictions)), None, fscope)
17 gradients = ag__.converted_call(ag__.ld(tape).gradient, (ag__.ld(loss), ag__.ld(transformer).trainable_variables), None, fscope)

File /usr/local/lib/python3.8/dist-packages/keras/src/utils/traceback_utils.py:70, in filter_traceback..error_handler(*args, **kwargs)
67 filtered_tb = _process_traceback_frames(e.traceback)
68 # To get the full stack trace, call:
69 # tf.debugging.disable_traceback_filtering()
—> 70 raise e.with_traceback(filtered_tb) from None
71 finally:
72 del filtered_tb

File /tmp/autograph_generated_file3f6zlr7g.py:12, in outer_factory..inner_factory..tf__call(self, input_sentence, output_sentence, training, enc_padding_mask, look_ahead_mask, dec_padding_mask)
10 retval
= ag
_.UndefinedReturnValue()
11 enc_output = ag__.converted_call(ag__.ld(self).encoder, (ag__.ld(input_sentence),), dict(training=ag__.ld(training), mask=ag__.ld(enc_padding_mask)), fscope)
—> 12 (dec_output, attention_weights) = ag__.converted_call(ag__.ld(self).decoder, (ag__.ld(output_sentence),), dict(enc_output=ag__.ld(enc_output), training=ag__.ld(training), look_ahead_mask=ag__.ld(look_ahead_mask), padding_mask=ag__.ld(dec_padding_mask)), fscope)
13 final_output = ag__.converted_call(ag__.ld(self).final_layer, (ag__.ld(dec_output),), None, fscope)
14 try:

File /tmp/autograph_generated_files9tlle_f.py:36, in outer_factory..inner_factory..tf__call(self, x, enc_output, training, look_ahead_mask, padding_mask)
34 i = ag
.Undefined(‘i’)
35 block1 = ag__.Undefined(‘block1’)
—> 36 ag__.for_stmt(ag__.converted_call(ag__.ld(range), (ag__.ld(self).num_layers,), None, fscope), None, loop_body, get_state, set_state, (‘x’,), {‘iterate_names’: ‘i’})
37 try:
38 do_return = True

File /tmp/autograph_generated_files9tlle_f.py:30, in outer_factory..inner_factory..tf__call..loop_body(itr)
28 nonlocal x
29 i = itr
—> 30 (x, block1, block2) = ag
.converted_call(ag__.ld(self).dec_layers[ag__.ld(i)], (ag__.ld(x),), dict(enc_output=ag__.ld(enc_output), training=ag__.ld(training), look_ahead_mask=ag__.ld(look_ahead_mask), padding_mask=ag__.ld(padding_mask)), fscope)
31 ag__.ld(attention_weights)[ag__.converted_call('decoder_layer{}block1_self_att’.format, ((ag_.ld(i) + 1),), None, fscope)] = ag__.ld(block1)
32 ag__.ld(attention_weights)[ag__.converted_call('decoder_layer{}block2_decenc_att’.format, ((ag_.ld(i) + 1),), None, fscope)] = ag__.ld(block2)

File /tmp/autograph_generated_file06gyjiza.py:28, in outer_factory..inner_factory..tf__call(self, x, enc_output, training, look_ahead_mask, padding_mask)
26 switch = ag
.Undefined(‘switch’)
27 ag__.if_stmt((ag__.ld(look_ahead_mask) != None), if_body, else_body, get_state, set_state, (‘switch’,), 1)
—> 28 (mult_attn_out1, attn_weights_block1) = ag__.converted_call(ag__.ld(self).mha1, (), dict(query=ag__.ld(x), value=ag__.ld(x), attention_mask=ag__.ld(padding_mask), return_attention_scores=True, training=ag__.ld(training), use_causal_mask=ag__.ld(switch)), fscope)
29 Q1 = ag__.converted_call(ag__.ld(self).layernorm1, ((ag__.ld(mult_attn_out1) + ag__.ld(x)),), None, fscope)
30 (mult_attn_out2, attn_weights_block2) = ag__.converted_call(ag__.ld(self).mha2, (), dict(query=ag__.ld(Q1), value=ag__.ld(enc_output), key=ag__.ld(enc_output), attention_mask=ag__.ld(padding_mask), return_attention_scores=True, training=ag__.ld(training)), fscope)

ValueError: in user code:

File "/tmp/ipykernel_735/2638583919.py", line 20, in train_step  *
    predictions, _ = model(
File "/usr/local/lib/python3.8/dist-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler  **
    raise e.with_traceback(filtered_tb) from None
File "/tmp/__autograph_generated_file3f6zlr7g.py", line 12, in tf__call
    (dec_output, attention_weights) = ag__.converted_call(ag__.ld(self).decoder, (ag__.ld(output_sentence),), dict(enc_output=ag__.ld(enc_output), training=ag__.ld(training), look_ahead_mask=ag__.ld(look_ahead_mask), padding_mask=ag__.ld(dec_padding_mask)), fscope)
File "/tmp/__autograph_generated_files9tlle_f.py", line 36, in tf__call
    ag__.for_stmt(ag__.converted_call(ag__.ld(range), (ag__.ld(self).num_layers,), None, fscope), None, loop_body, get_state, set_state, ('x',), {'iterate_names': 'i'})
File "/tmp/__autograph_generated_files9tlle_f.py", line 30, in loop_body
    (x, block1, block2) = ag__.converted_call(ag__.ld(self).dec_layers[ag__.ld(i)], (ag__.ld(x),), dict(enc_output=ag__.ld(enc_output), training=ag__.ld(training), look_ahead_mask=ag__.ld(look_ahead_mask), padding_mask=ag__.ld(padding_mask)), fscope)
File "/tmp/__autograph_generated_file06gyjiza.py", line 28, in tf__call
    (mult_attn_out1, attn_weights_block1) = ag__.converted_call(ag__.ld(self).mha1, (), dict(query=ag__.ld(x), value=ag__.ld(x), attention_mask=ag__.ld(padding_mask), return_attention_scores=True, training=ag__.ld(training), use_causal_mask=ag__.ld(switch)), fscope)

ValueError: Exception encountered when calling layer 'transformer_2' (type Transformer).

in user code:

    File "/tmp/ipykernel_735/2648137580.py", line 57, in call  *
        dec_output, attention_weights = self.decoder(
    File "/usr/local/lib/python3.8/dist-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler  **
        raise e.with_traceback(filtered_tb) from None
    File "/tmp/__autograph_generated_files9tlle_f.py", line 36, in tf__call
        ag__.for_stmt(ag__.converted_call(ag__.ld(range), (ag__.ld(self).num_layers,), None, fscope), None, loop_body, get_state, set_state, ('x',), {'iterate_names': 'i'})
    File "/tmp/__autograph_generated_files9tlle_f.py", line 30, in loop_body
        (x, block1, block2) = ag__.converted_call(ag__.ld(self).dec_layers[ag__.ld(i)], (ag__.ld(x),), dict(enc_output=ag__.ld(enc_output), training=ag__.ld(training), look_ahead_mask=ag__.ld(look_ahead_mask), padding_mask=ag__.ld(padding_mask)), fscope)
    File "/tmp/__autograph_generated_file06gyjiza.py", line 28, in tf__call
        (mult_attn_out1, attn_weights_block1) = ag__.converted_call(ag__.ld(self).mha1, (), dict(query=ag__.ld(x), value=ag__.ld(x), attention_mask=ag__.ld(padding_mask), return_attention_scores=True, training=ag__.ld(training), use_causal_mask=ag__.ld(switch)), fscope)

    ValueError: Exception encountered when calling layer 'decoder_4' (type Decoder).
    
    in user code:
    
        File "/tmp/ipykernel_735/3237182596.py", line 66, in call  *
            x, block1, block2 = self.dec_layers[i](
        File "/usr/local/lib/python3.8/dist-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler  **
            raise e.with_traceback(filtered_tb) from None
        File "/tmp/__autograph_generated_file06gyjiza.py", line 28, in tf__call
            (mult_attn_out1, attn_weights_block1) = ag__.converted_call(ag__.ld(self).mha1, (), dict(query=ag__.ld(x), value=ag__.ld(x), attention_mask=ag__.ld(padding_mask), return_attention_scores=True, training=ag__.ld(training), use_causal_mask=ag__.ld(switch)), fscope)
    
        ValueError: Exception encountered when calling layer 'decoder_layer_23' (type DecoderLayer).
        
        in user code:
        
            File "/tmp/ipykernel_735/470627807.py", line 61, in call  *
                mult_attn_out1, attn_weights_block1 = self.mha1(
            File "/usr/local/lib/python3.8/dist-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler  **
                raise e.with_traceback(filtered_tb) from None
        
            ValueError: Exception encountered when calling layer 'multi_head_attention_57' (type MultiHeadAttention).
            
            Dimensions must be equal, but are 150 and 49 for '{{node transformer_2/decoder_4/decoder_layer_23/multi_head_attention_57/and}} = LogicalAnd[](transformer_2/decoder_4/decoder_layer_23/multi_head_attention_57/Cast, transformer_2/decoder_4/decoder_layer_23/multi_head_attention_57/MatrixBandPart)' with input shapes: [64,1,150], [1,49,49].
            
            Call arguments received by layer 'multi_head_attention_57' (type MultiHeadAttention):
              • query=tf.Tensor(shape=(64, 49, 128), dtype=float32)
              • value=tf.Tensor(shape=(64, 49, 128), dtype=float32)
              • key=None
              • attention_mask=tf.Tensor(shape=(64, 1, 150), dtype=float32)
              • return_attention_scores=True
              • training=True
              • use_causal_mask=True
        
        
        Call arguments received by layer 'decoder_layer_23' (type DecoderLayer):
          • x=tf.Tensor(shape=(64, 49, 128), dtype=float32)
          • enc_output=tf.Tensor(shape=(64, 150, 128), dtype=float32)
          • training=True
          • look_ahead_mask=tf.Tensor(shape=(1, 49, 49), dtype=float32)
          • padding_mask=tf.Tensor(shape=(64, 1, 150), dtype=float32)
    
    
    Call arguments received by layer 'decoder_4' (type Decoder):
      • x=tf.Tensor(shape=(64, 49), dtype=int32)
      • enc_output=tf.Tensor(shape=(64, 150, 128), dtype=float32)
      • training=True
      • look_ahead_mask=tf.Tensor(shape=(1, 49, 49), dtype=float32)
      • padding_mask=tf.Tensor(shape=(64, 1, 150), dtype=float32)


Call arguments received by layer 'transformer_2' (type Transformer):
  • input_sentence=tf.Tensor(shape=(64, 150), dtype=int32)
  • output_sentence=tf.Tensor(shape=(64, 49), dtype=int32)
  • training=True
  • enc_padding_mask=tf.Tensor(shape=(64, 1, 150), dtype=float32)
  • look_ahead_mask=tf.Tensor(shape=(1, 49, 49), dtype=float32)
  • dec_padding_mask=tf.Tensor(shape=(64, 1, 150), dtype=float32)

I think it is somehow the same kind of issue I was having. In the process there is a point were padding is not being added. I trying to fixing it by adding a extra cell, and correcting the code I cant modify. The problem seem to be simple. I think I can fix it once I find where to make the correction.

hi @Maur_cd can I know for which grade cell this error was thrown for next_word? and yet you passed the grader?

can you share image of your submission grader output by clicking on show grader output.

Also it is a request, always better share a screenshot rather than copy paste, this is because some of the syntax error cannot be catched by the method of copy paste.

Regards
DP

Also as per your error it still tells your codes are incorrect for block1, clearing a particular grade cell and then not clearing other one, still can encounter issue.

because the model works based on how you recalled encoder or decoder, whatever codes you mention will reflect in further model. like if you used an input in encoder that same has an effect on decoder and then further in your model training. If model is failing then there is still issue in your codes as per autograder.

Regards
DP

This error was thrown when I tried to train the model, on section 12, which after I completed all the exercises. I originally had issues with the grader, coming from the next_word function which I had to modify to pass.

Now the error comes from the given function train_step. I think it is the same type of error. The mask doesn’t have the appropriate dimensions, I think it is because the input of the decoder is not being padded before passing to the masking function.

train_step, includes data from your previously recalled error, so even if next word pased the test for the codes you wrote in encoder or decoder, the error in encoder or decoder gets catched up while training.

So as mentioned in my first reply, your issue lies in the block1 code which your error log seems to be pointing again and again.

What you could do, for better understanding of this bug, save a copy of the assignment you are doing, then get a fresh copy and mention codes only between the markers ###START AND END CODE HERE###

MAKE SURE YOU READ INSTRUCTIONS FOR MULTIHEADATTENTION CAREFULLY, you can share screenshot of codes for multihead by personal DM if you want.

Regards
DP

to get fresh copy, please follow the below link comment

Yeah, thank you. I found what I was doing wrong. I was adding a padded mask to the first multihead attention layer in the decoder. Now everything is running.

1 Like

The bugger :crazy_face:

1 Like